Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning (2008.00401v1)

Published 2 Aug 2020 in cs.CL

Abstract: Recent work demonstrates the potential of multilingual pretraining of creating one model that can be used for various tasks in different languages. Previous work in multilingual pretraining has demonstrated that machine translation systems can be created by finetuning on bitext. In this work, we show that multilingual translation models can be created through multilingual finetuning. Instead of finetuning on one direction, a pretrained model is finetuned on many directions at the same time. Compared to multilingual models trained from scratch, starting from pretrained models incorporates the benefits of large quantities of unlabeled monolingual data, which is particularly important for low resource languages where bitext is not available. We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance. We double the number of languages in mBART to support multilingual machine translation models of 50 languages. Finally, we create the ML50 benchmark, covering low, mid, and high resource languages, to facilitate reproducible research by standardizing training and evaluation data. On ML50, we demonstrate that multilingual finetuning improves on average 1 BLEU over the strongest baselines (being either multilingual from scratch or bilingual finetuning) while improving 9.3 BLEU on average over bilingual baselines from scratch.

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

This paper presents a paper on multilingual translation using a methodology involving extensible multilingual pretraining and finetuning, focusing on the effectiveness of multilingual finetuning as opposed to traditional bilingual finetuning. The authors advance the existing capabilities of the mBART model by expanding its multilingual abilities, thereby proposing an approach that accommodates 50 languages, compared to the original 25. This extension is achieved without negatively impacting performance on the languages originally supported by mBART.

Methodology

The core methodology involves leveraging multilingual pretraining through mBART, which initially encodes monolingual data for 25 languages in a sequence-to-sequence setting. The significant augmentation occurs during the finetuning phase, where rather than adapting the model to a single language pair, the approach extends the model to cater to multiple language pairs simultaneously through multilingual finetuning. This method is particularly advantageous for low-resource languages that lack substantial bitext, as it relies on an enriched pretraining dataset that includes monolingual corpora, such as those derived from Commoncrawl and Wikipedia.

Benchmarks and Evaluation

To facilitate research and achieve rigorous testing, the authors introduce the ML50 benchmark, covering an extensive spectrum of languages categorized by resource availability. This benchmark standardizes the training and evaluation conditions across languages, enabling researchers to assess machine translation performance consistently. Through this benchmark, the authors reported an average improvement of 1 BLEU point over existing multilingual models trained from scratch. Notably, low-resource languages showed improvements of up to 18 BLEU points in the multilingual finetuning setting.

Results and Implications

Multilingual finetuning outperforms traditional bilingual finetuning, particularly in many-to-one language pair settings, with the most significant gains evident for low-resource languages. The finetuning approach not only improves translation quality but also offers substantial efficiency benefits by consolidating multiple translation directions into a single model, thereby reducing the computational footprint. This methodology provides a potent strategy for scaling neural machine translation (NMT) systems to support an increasing number of languages without the computational cost of training from scratch.

Future Directions

While the paper highlights substantial gains, the challenges associated with translation into many languages (one-to-Many) remain, suggesting further investigation into model capacity and multilingual model design efficiencies will be beneficial. Future work could explore the integration of even more languages and potentially domain-specific adaptations of the model to enhance quality and applicability. The release of mBART50 as a community resource fosters continued research in multilingual NMT, supporting innovative approaches that bridge language barriers in the global communication landscape.

In summary, the extensibility of pre-trained models through multilingual finetuning presents an efficient and effective methodology for advancing multilingual translation capabilities. This paper underlines the significance of flexible model architectures that leverage extensive monolingual corpora, paving the way for future innovations across a broader range of languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yuqing Tang (12 papers)
  2. Chau Tran (13 papers)
  3. Xian Li (116 papers)
  4. Peng-Jen Chen (26 papers)
  5. Naman Goyal (37 papers)
  6. Vishrav Chaudhary (45 papers)
  7. Jiatao Gu (84 papers)
  8. Angela Fan (49 papers)
Citations (421)