XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders (2012.15547v1)
Abstract: Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of LLM pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fine-tunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation. The code will be at https://aka.ms/xlm-t.
- Shuming Ma (83 papers)
- Jian Yang (503 papers)
- Haoyang Huang (27 papers)
- Zewen Chi (29 papers)
- Li Dong (154 papers)
- Dongdong Zhang (79 papers)
- Hany Hassan Awadalla (24 papers)
- Alexandre Muzio (8 papers)
- Akiko Eriguchi (11 papers)
- Saksham Singhal (14 papers)
- Xia Song (38 papers)
- Arul Menezes (15 papers)
- Furu Wei (291 papers)