Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders (2104.08757v2)

Published 18 Apr 2021 in cs.CL

Abstract: Previous work mainly focuses on improving cross-lingual transfer for NLU tasks with a multilingual pretrained encoder (MPE), or improving the performance on supervised machine translation with BERT. However, it is under-explored that whether the MPE can help to facilitate the cross-lingual transferability of NMT model. In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In this task, the NMT model is trained with parallel dataset of only one language pair and an off-the-shelf MPE, then it is directly tested on zero-shot language pairs. We propose SixT, a simple yet effective model for this task. SixT leverages the MPE with a two-stage training schedule and gets further improvement with a position disentangled encoder and a capacity-enhanced decoder. Using this method, SixT significantly outperforms mBART, a pretrained multilingual encoder-decoder model explicitly designed for NMT, with an average improvement of 7.1 BLEU on zero-shot any-to-English test sets across 14 source languages. Furthermore, with much less training computation cost and training data, our model achieves better performance on 15 any-to-English test sets than CRISS and m2m-100, two strong multilingual NMT baselines.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (8)

Guanhua Chen (71 papers)
Shuming Ma (83 papers)
Yun Chen (134 papers)
Li Dong (154 papers)
Dongdong Zhang (79 papers)
Jia Pan (127 papers)
Wenping Wang (184 papers)
Furu Wei (291 papers)

Citations (37)

View on Semantic Scholar

Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders (2104.08757v2)

Related Papers