Papers
Topics
Authors
Recent
Search
2000 character limit reached

Disentangling the Roles of Target-Side Transfer and Regularization in Multilingual Machine Translation

Published 1 Feb 2024 in cs.CL, cs.AI, and cs.LG | (2402.01772v1)

Abstract: Multilingual Machine Translation (MMT) benefits from knowledge transfer across different language pairs. However, improvements in one-to-many translation compared to many-to-one translation are only marginal and sometimes even negligible. This performance discrepancy raises the question of to what extent positive transfer plays a role on the target-side for one-to-many MT. In this paper, we conduct a large-scale study that varies the auxiliary target side languages along two dimensions, i.e., linguistic similarity and corpus size, to show the dynamic impact of knowledge transfer on the main language pairs. We show that linguistically similar auxiliary target languages exhibit strong ability to transfer positive knowledge. With an increasing size of similar target languages, the positive transfer is further enhanced to benefit the main language pairs. Meanwhile, we find distant auxiliary target languages can also unexpectedly benefit main language pairs, even with minimal positive transfer ability. Apart from transfer, we show distant auxiliary target languages can act as a regularizer to benefit translation performance by enhancing the generalization and model inference calibration.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Massively multilingual neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3874–3884, Minneapolis, Minnesota. Association for Computational Linguistics.
  2. Massively multilingual neural machine translation in the wild: Findings and challenges. CoRR, abs/1907.05019.
  3. Breaking down multilingual machine translation. In Findings.
  4. Palm: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
  5. A comprehensive survey of multilingual neural machine translation. ArXiv, abs/2001.01115.
  6. Multi-task learning for multiple language translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1723–1732, Beijing, China. Association for Computational Linguistics.
  7. Scaling laws for multilingual neural machine translation. ArXiv, abs/2302.09650.
  8. Improving target-side lexical transfer in multilingual neural machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3560–3566, Online. Association for Computational Linguistics.
  9. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics, 5:339–351.
  10. Diederik P. Kingma and Jimmy Ba. 2017. Adam: A method for stochastic optimization.
  11. Tom Kocmi and Ondřej Bojar. 2018. Trivial transfer learning for low-resource neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 244–252, Brussels, Belgium. Association for Computational Linguistics.
  12. Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing.
  13. Investigating multilingual NMT representations at scale. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1565–1575, Hong Kong, China. Association for Computational Linguistics.
  14. Transfer learning in multilingual neural machine translation with dynamic vocabulary. In International Workshop on Spoken Language Translation.
  15. R-drop: Regularized dropout for neural networks. ArXiv, abs/2106.14448.
  16. When does label smoothing help? In Neural Information Processing Systems.
  17. Addressing word-order divergence in multilingual neural machine translation for extremely low resource languages. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3868–3873, Minneapolis, Minnesota. Association for Computational Linguistics.
  18. Graham Neubig and Junjie Hu. 2018. Rapid adaptation of neural machine translation to new languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 875–880, Brussels, Belgium. Association for Computational Linguistics.
  19. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48–53, Minneapolis, Minnesota. Association for Computational Linguistics.
  20. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium. Association for Computational Linguistics.
  21. Causes and cures for interference in multilingual translation. ArXiv, abs/2212.07530.
  22. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 223–231, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas.
  23. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15(1):1929–1958.
  24. Viewing knowledge transfer in multilingual machine translation through a representational lens. ArXiv, abs/2305.11550.
  25. Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826.
  26. Multilingual translation with extensible multilingual pretraining and finetuning. ArXiv, abs/2008.00401.
  27. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  28. On the inference calibration of neural machine translation. In Annual Meeting of the Association for Computational Linguistics.
  29. Three strategies to improve one-to-many multilingual translation. In Conference on Empirical Methods in Natural Language Processing.
  30. Characterizing and avoiding negative transfer. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11285–11294.
  31. A survey of transfer learning. Journal of Big Data, 3(1):9.
  32. Improving multilingual translation by representation and gradient regularization. ArXiv, abs/2109.04778.
  33. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1568–1575, Austin, Texas. Association for Computational Linguistics.
Citations (1)

Summary

  • The paper establishes that target-side transfer boosts translation quality when auxiliary languages share strong linguistic similarities.
  • It shows that incorporating distant auxiliary target languages serves as an effective regularizer, reducing overfitting and improving model generalization.
  • Extensive experiments across low- and medium-resource settings highlight the nuanced trade-offs between knowledge transfer and regularization in multilingual machine translation.

Introduction

In the field of Multilingual Machine Translation (MMT), the debate over whether knowledge transfer or regularization plays a more significant role in translation quality—particularly when translating from one source language to many target languages (one-to-many MT)—is a complex one. Contrary to common assumptions that such transfer on the target side is minimal or even non-existent, this work presents a meaningful exploration, dissecting the dichotomy between the effects of target-side transfer and regularization in one-to-many MMT.

Knowledge Transfer

The research team conducted controlled experiments, accounting for linguistic similarity and corpus size, evaluating their contribution to translation improvements. The resulting data uncovered a positive correlation between linguistic akinness and the enhancement of translation performance, thereby asserting the significance of knowledge transfer. For instance, the findings indicate that adjacent target languages prompt more substantial positive knowledge transfer compared to distant auxiliary languages. The observation that an increasing volume of relatable auxiliary target languages further aids the main language pairs stands in bold contrast to previous beliefs, substantiating the existence and impact of knowledge transfer on the target side.

Regularization

Paradoxically, the paper also presents the beneficial effects of including distant auxiliary target languages that, despite their minimal positive transfer capability, improve main language pair translation performance. The authors attribute this unexpected gain to the strong regularization abilities of these languages, which enhance generalization and improve the model's inference calibration. Distant auxiliary target languages, by diversifying training data, prevent model overfitting and align prediction confidence with actual performance efficacy—a critical point that challenges existing MMT paradigms.

Experimental Analysis

The paper meticulously outlines the experimental setup involving a variety of both real-world and simulated language pair scenarios, alongside detailed background information on the nuances of transfer learning and regularization in MMT. Through extensive experimentation and evaluation, the team illustrates the multifaceted nature of the impact that additional target languages have on primary translation tasks. Notably, the thorough approach accounts for varying resources in data, incorporating both low- and medium-resource settings to uphold the integrity and applicability of the results.

Conclusion and Future Directions

Conclusively, the research delineates a comprehensive understanding of target-side transfer and regularization within one-to-many MMT, steering clear from simplistic attributions of success to source data augmentation. The insights offered by the examination of auxiliary language contributions are poised to significantly inform the optimization of MMT systems, potentially leading to tailored approaches that exploit linguistic similarities alongside strategic use of auxiliary, unrelated language data for regularization. The limitations acknowledged in the current scope of work pave the way for subsequent explorations into the dynamic trade-offs between different language pairs and settings beyond the one-to-many framework.

This meticulous analysis and its results are instrumental in guiding both current practice and future investigations into the efficient development of MMT systems. It pushes the boundaries of what we understand about machine translation learning dynamics, encouraging the furtherance of research that simultaneously contemplates the mechanisms of both knowledge transfer and regularization to ultimately elevate MMT performance.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 14 likes about this paper.