Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Boosting Many-to-Many Multilingual Machine Translation with Large Language Models (2401.05861v2)

Published 11 Jan 2024 in cs.CL

Abstract: The training paradigm for machine translation has gradually shifted, from learning neural machine translation (NMT) models with extensive parallel corpora to instruction finetuning on multilingual LLMs with high-quality translation pairs. In this paper, we focus on boosting many-to-many multilingual translation of LLMs with an emphasis on zero-shot translation directions. We demonstrate that prompt strategies adopted during finetuning are crucial to zero-shot translation and introduce a cross-lingual consistency regularization, XConST, to bridge the representation gap among different languages and improve zero-shot translation performance. XConST is not a new method, but a version of CrossConST (Gao et al., 2023a) adapted for translation instruction finetuning with LLMs. Experimental results on ALMA (Xu et al., 2023), Tower (Team, 2024), and LLaMA-2 (Touvron et al., 2023) show that our approach consistently improves translation performance. Our implementations are available at https://github.com/gpengzhi/CrossConST-LLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. In-context examples selection for machine translation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8857–8873, Toronto, Canada. Association for Computational Linguistics.
  2. Palm 2 technical report.
  3. Findings of the 2020 conference on machine translation (WMT20). In Proceedings of the Fifth Conference on Machine Translation, pages 1–55, Online. Association for Computational Linguistics.
  4. Findings of the 2019 conference on machine translation (WMT19). In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 1–61, Florence, Italy. Association for Computational Linguistics.
  5. Findings of the 2013 Workshop on Statistical Machine Translation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 1–44, Sofia, Bulgaria. Association for Computational Linguistics.
  6. Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Second Conference on Machine Translation, pages 169–214, Copenhagen, Denmark. Association for Computational Linguistics.
  7. Findings of the 2018 conference on machine translation (WMT18). In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 272–303, Belgium, Brussels. Association for Computational Linguistics.
  8. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 17–53, Uppsala, Sweden. Association for Computational Linguistics.
  9. Findings of the 2012 workshop on statistical machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation, pages 10–51, Montréal, Canada. Association for Computational Linguistics.
  10. Findings of the 2011 workshop on statistical machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 22–64, Edinburgh, Scotland. Association for Computational Linguistics.
  11. Palm: Scaling language modeling with pathways.
  12. Results of WMT22 metrics shared task: Stop using BLEU – neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 46–68, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  13. Improving zero-shot multilingual neural machine translation by leveraging cross-lingual consistency regularization. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12103–12119, Toronto, Canada. Association for Computational Linguistics.
  14. Learning multilingual sentence representations with cross-lingual consistency regularization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 243–262, Singapore. Association for Computational Linguistics.
  15. How good are gpt models at machine translation? a comprehensive evaluation.
  16. Geoffrey E Hinton and Sam Roweis. 2002. Stochastic neighbor embedding. In Advances in Neural Information Processing Systems, volume 15. MIT Press.
  17. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  18. Is chatgpt a good translator? yes with gpt-4 as the engine.
  19. Fasttext.zip: Compressing text classification models.
  20. Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 427–431, Valencia, Spain. Association for Computational Linguistics.
  21. Findings of the 2022 conference on machine translation (WMT22). In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1–45, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
  22. Eliciting the translation ability of large language models via multilingual finetuning with translation instructions.
  23. Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  24. OpenAI. 2023a. Chatgpt [large language model]. https://chat.openai.com/.
  25. OpenAI. 2023b. Gpt-4 technical report.
  26. Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium. Association for Computational Linguistics.
  27. COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online. Association for Computational Linguistics.
  28. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  29. No language left behind: Scaling human-centered machine translation.
  30. Jörg Tiedemann. 2012. Parallel data, tools and interfaces in opus. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Language Resources Association (ELRA).
  31. Llama 2: Open foundation and fine-tuned chat models.
  32. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  33. Prompting PaLM for translation: Assessing strategies and performance. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15406–15427, Toronto, Canada. Association for Computational Linguistics.
  34. Language tags matter for zero-shot neural machine translation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3001–3007, Online. Association for Computational Linguistics.
  35. A paradigm shift in machine translation: Boosting translation performance of large language models.
  36. Bigtranslate: Augmenting large language models with multilingual translation capability over 100 languages.
  37. Tim: Teaching large language models to translate with comparison.
  38. Prompting large language model for machine translation: A case study. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
  39. Bayling: Bridging cross-lingual alignment and instruction following through interactive translation for large language models.
  40. Multilingual machine translation with large language models: Empirical results and analysis.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Pengzhi Gao (14 papers)
  2. Zhongjun He (19 papers)
  3. Hua Wu (191 papers)
  4. Haifeng Wang (194 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets