Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders (2012.15547v1)

Published 31 Dec 2020 in cs.CL

Abstract: Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of LLM pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fine-tunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation. The code will be at https://aka.ms/xlm-t.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Shuming Ma (83 papers)
  2. Jian Yang (503 papers)
  3. Haoyang Huang (27 papers)
  4. Zewen Chi (29 papers)
  5. Li Dong (154 papers)
  6. Dongdong Zhang (79 papers)
  7. Hany Hassan Awadalla (24 papers)
  8. Alexandre Muzio (8 papers)
  9. Akiko Eriguchi (11 papers)
  10. Saksham Singhal (14 papers)
  11. Xia Song (38 papers)
  12. Arul Menezes (15 papers)
  13. Furu Wei (291 papers)
Citations (33)