Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation (2110.08547v2)

Published 16 Oct 2021 in cs.CL

Abstract: This paper demonstrates that multilingual pretraining and multilingual fine-tuning are both critical for facilitating cross-lingual transfer in zero-shot translation, where the neural machine translation (NMT) model is tested on source languages unseen during supervised training. Following this idea, we present SixT+, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages. SixT+ initializes the decoder embedding and the full encoder with XLM-R large and then trains the encoder and decoder layers with a simple two-stage training strategy. SixT+ achieves impressive performance on many-to-English translation. It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively. Additionally, SixT+ offers a set of model parameters that can be further fine-tuned to other unsupervised tasks. We demonstrate that adding SixT+ initialization outperforms state-of-the-art explicitly designed unsupervised NMT models on Si<->En and Ne<->En by over 1.2 average BLEU. When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12.3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (7)

Guanhua Chen (71 papers)
Shuming Ma (83 papers)
Yun Chen (134 papers)
Dongdong Zhang (79 papers)
Jia Pan (127 papers)
Wenping Wang (184 papers)
Furu Wei (291 papers)

Citations (14)

View on Semantic Scholar

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation (2110.08547v2)

Related Papers