Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora (1701.08339v1)

Published 29 Jan 2017 in cs.CL

Abstract: The effectiveness of a statistical machine translation system (SMT) is very dependent upon the amount of parallel corpus used in the training phase. For low-resource language pairs there are not enough parallel corpora to build an accurate SMT. In this paper, a novel approach is presented to extract bilingual Persian-Italian parallel sentences from a non-parallel (comparable) corpus. In this study, English is used as the pivot language to compute the matching scores between source and target sentences and candidate selection phase. Additionally, a new monolingual sentence similarity metric, Normalized Google Distance (NGD) is proposed to improve the matching process. Moreover, some extensions of the baseline system are applied to improve the quality of extracted sentences measured with BLEU. Experimental results show that using the new pivot based extraction can increase the quality of bilingual corpus significantly and consequently improves the performance of the Persian-Italian SMT system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ebrahim Ansari (7 papers)
  2. M. H. Sadreddini (2 papers)
  3. Mostafa Sheikhalishahi (1 paper)
  4. Richard Wallace (1 paper)
  5. Fatemeh Alimardani (1 paper)
Citations (2)

Summary

We haven't generated a summary for this paper yet.