Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens
Abstract: We argue that translation quality alone is not a sufficient metric for measuring knowledge transfer in multilingual neural machine translation. To support this claim, we introduce Representational Transfer Potential (RTP), which measures representational similarities between languages. We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality, indicating that transfer does occur. Furthermore, we investigate data and language characteristics that are relevant for transfer, and find that multi-parallel overlap is an important yet under-explored feature. Based on this, we develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages by taking advantage of multi-parallel data. We show that our method yields increased translation quality for low- and mid-resource languages across multiple data and model setups.
- Massively multilingual neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pages 3874–3884, Minneapolis, Minnesota. Association for Computational Linguistics.
- Massively multilingual neural machine translation in the wild: findings and challenges. arXiv:1907.05019 [cs]. ArXiv: 1907.05019.
- Leo Breiman. 2001. Random Forests. Springer, 45:5–32.
- Improving multilingual models with language-clustered vocabularies. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4536–4546, Online. Association for Computational Linguistics.
- Emerging cross-lingual structure in pretrained language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6022–6034, Online. Association for Computational Linguistics.
- From bilingual to multilingual neural machine translation by incremental training. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 236–242, Florence, Italy. Association for Computational Linguistics.
- Multilingual Machine Translation: Deep Analysis of Language-Specific Encoder-Decoders. Journal of Artificial Intelligence Research, 73:1535–1552.
- Beyond English-centric multilingual machine translation. Journal of Machine Learning Research, 22(107):1–48.
- NTREX-128 – News Test References for MT Evaluation of 128 Languages. In Proceedings of the First Workshop on Scaling Up Multilingual Evaluation, pages 21–24, Online. Association for Computational Linguistics.
- All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research, 20(177):1–81.
- Markus Freitag and Orhan Firat. 2020. Complete multilingual neural machine translation. In Proceedings of the Fifth Conference on Machine Translation, pages 548–558, Online. Association for Computational Linguistics.
- BLEU might be guilty but references are not innocent. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 61–71, Online. Association for Computational Linguistics.
- The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation. arXiv:2106.03193 [cs]. ArXiv: 2106.03193.
- Toward multilingual neural machine translation with universal encoder and decoder. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Seattle, Washington.
- State-of-the-art generalisation research in NLP: A taxonomy and review. ArXiv:2210.03050 [cs].
- Google’s multilingual neural machine translation system: enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
- Cross-lingual ability of multilingual BERT: an empirical study. In International Conference on Learning Representations, Online.
- Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads? In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2832–2841, Online. Association for Computational Linguistics.
- Diederik P. Kingma and Jimmy L. Ba. 2015. Adam: a method for stochastic optimization. In International conference on learning representations, San Diego, California.
- Tom Kocmi and Ondřej Bojar. 2018. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the Third Conference on Machine Translation, pages 244–252, Belgium, Brussels. Association for Computational Linguistics.
- Investigating multilingual NMT representations at scale. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 1565–1575, Hong Kong, China. Association for Computational Linguistics.
- Back-translation for Large-Scale Multilingual Machine Translation. In Proceedings of the Sixth Conference on Machine Translation, pages 418–424, Online. Association for Computational Linguistics.
- Choosing Transfer Languages for Cross-Lingual Learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3125–3135, Florence, Italy. Association for Computational Linguistics.
- Graham Neubig and Junjie Hu. 2018. Rapid adaptation of neural machine translation to new languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 875–880, Brussels, Belgium. Association for Computational Linguistics.
- Bridging linguistic topology and multilingual machine translation with multi-view language representations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2391–2406, Online. Association for Computational Linguistics.
- BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, page 311, Philadelphia, Pennsylvania. Association for Computational Linguistics.
- Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Dublin, Ireland. Association for Computational Linguistics.
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85):2825–2830.
- Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Belgium, Brussels. Association for Computational Linguistics.
- Ofir Press and Lior Wolf. 2017. Using the output embedding to improve language models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 157–163, Valencia, Spain. Association for Computational Linguistics.
- When and why are pre-trained word embeddings useful for neural machine translation? In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, pages 529–535, New Orleans, Louisiana. Association for Computational Linguistics.
- SVCCA: singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems, volume 30, pages 6076–6085. Curran Associates, Inc.
- Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725, Berlin, Germany. Association for Computational Linguistics.
- Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.
- Alternative Input Signals Ease Transfer in Multilingual Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5291–5305, Dublin, Ireland. Association for Computational Linguistics.
- Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, Las Vegas, Nevada. IEEE.
- William Timkey and Marten van Schijndel. 2021. All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4527–4546, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Attention is all you need. In Advances in Neural Information Processing Systems (NIPS), Long Beach, California. Neural Information Processing Systems (NIPS). ArXiv: 1706.03762.
- Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation. ArXiv:2305.14189 [cs].
- EAG: Extract and Generate Multi-way Aligned Corpus for Complete Multi-lingual Neural Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Dublin, Ireland. Association for Computational Linguistics.
- Improving massively multilingual neural machine translation and zero-shot translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, Washington.
- Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1568–1575, Austin, Texas. Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.