Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning? (2403.16777v1)

Published 25 Mar 2024 in cs.CL

Abstract: Multilingual pretraining and fine-tuning have remarkably succeeded in various natural language processing tasks. Transferring representations from one language to another is especially crucial for cross-lingual learning. One can expect machine translation objectives to be well suited to fostering such capabilities, as they involve the explicit alignment of semantically equivalent sentences from different languages. This paper investigates the potential benefits of employing machine translation as a continued training objective to enhance language representation learning, bridging multilingual pretraining and cross-lingual applications. We study this question through two lenses: a quantitative evaluation of the performance of existing models and an analysis of their latent representations. Our results show that, contrary to expectations, machine translation as the continued training fails to enhance cross-lingual representation learning in multiple cross-lingual natural language understanding tasks. We conclude that explicit sentence-level alignment in the cross-lingual scenario is detrimental to cross-lingual transfer pretraining, which has important implications for future cross-lingual transfer studies. We furthermore provide evidence through similarity measures and investigation of parameters that this lack of positive influence is due to output separability -- which we argue is of use for machine translation but detrimental elsewhere.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Multilingual alignment of contextual word representations. In International Conference on Learning Representations.
  2. Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 15–26, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  3. On the off-target problem of zero-shot multilingual neural machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9542–9558, Toronto, Canada. Association for Computational Linguistics.
  4. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451.
  5. XNLI: Evaluating Cross-lingual Sentence Representations. In EMNLP.
  6. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672.
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  8. Zero-shot cross-lingual classification using multilingual neural machine translation.
  9. Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2485–2494, Hong Kong, China. Association for Computational Linguistics.
  10. nmT5 - is parallel data still relevant for pre-training massively multilingual language models? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 683–691, Online. Association for Computational Linguistics.
  11. Similarity of neural network representations revisited. In International Conference on Machine Learning, pages 3519–3529. PMLR.
  12. XGLUE: A new benchmark dataset for cross-lingual pre-training, understanding and generation. arXiv, abs/2004.01401.
  13. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
  14. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization.
  15. A neural interlingua for multilingual machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 84–92, Brussels, Belgium. Association for Computational Linguistics.
  16. Margaret Masterman. 1961. Semantic message detection for machine translation, using an interlingua. In Proceedings of the International Conference on Machine Translation and Applied Language Analysis, National Physical Laboratory, Teddington, UK.
  17. Cross-lingual learning for text processing: A survey. Expert Systems with Applications, 165:113765.
  18. Erik F. Tjong Kim Sang. 2002. Introduction to the conll-2002 shared task: Language-independent named entity recognition. ArXiv, cs.CL/0209010.
  19. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. ArXiv, cs.CL/0306050.
  20. Multilingual translation with extensible multilingual pretraining and finetuning. arXiv preprint arXiv:2008.00401.
  21. Jörg Tiedemann. 2018. Emerging language spaces learned from massively multilingual corpora. In Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference (DHN 2018), volume 2084 of CEUR Workshop Proceedings, pages 188–197, Unknown. CEUR Workshop Proceedings. Digital humanities in the Nordic Countries DHN2018, DHN2018 ; Conference date: 07-03-2018 Through 09-03-2018.
  22. On the differences between BERT and MT encoder spaces and how to address them in translation tasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 337–347, Online. Association for Computational Linguistics.
  23. Cross-lingual bert transformation for zero-shot dependency parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5721–5727.
  24. Shijie Wu and Mark Dredze. 2019. Beto, bentz, becas: The surprising cross-lingual effectiveness of bert. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844.
  25. Shijie Wu and Mark Dredze. 2020. Do explicit alignments robustly improve multilingual encoders? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4471–4482.
  26. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  27. Paws-x: A cross-lingual adversarial dataset for paraphrase identification. ArXiv, abs/1908.11828.
  28. Universal dependencies 2.5.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shaoxiong Ji (39 papers)
  2. Timothee Mickus (20 papers)
  3. Vincent Segonne (6 papers)
  4. Jörg Tiedemann (41 papers)
Citations (2)