Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Extremely low-resource machine translation for closely related languages (2105.13065v1)

Published 27 May 2021 in cs.CL

Abstract: An effective method to improve extremely low-resource neural machine translation is multilingual training, which can be improved by leveraging monolingual data to create synthetic bilingual corpora using the back-translation method. This work focuses on closely related languages from the Uralic language family: from Estonian and Finnish geographical regions. We find that multilingual learning and synthetic corpora increase the translation quality in every language pair for which we have data. We show that transfer learning and fine-tuning are very effective for doing low-resource machine translation and achieve the best results. We collected new parallel data for V~oro, North and South Saami and present first results of neural machine translation for these languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Maali Tars (1 paper)
  2. Andre Tättar (3 papers)
  3. Mark Fišel (2 papers)
Citations (15)