Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights (2405.01345v3)

Published 2 May 2024 in cs.CL

Abstract: Bridging the significant gap between LLM's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment framework leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In this paper, we explore how broadly this method can be applied by examining its effects in reasoning with and without chain-of-thought, as well as with program-of-thought. We also explore applying this framework to extremely LLMs in an efficient manner, such as through proxy-tuning. Experiment results on multilingual reasoning benchmarks mGSM, mSVAMP, xCSQA and xNLI demonstrate that we can extend question alignment framework to boost multilingual performance across diverse reasoning scenarios, model families, and sizes. For instance, when applied to the LLaMA2 models, it brings an average accuracy improvements of 12.2% on mGSM even with the 70B model. To understand the mechanism of its success, we analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs and shapes their working patterns.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Terra Blevins and Luke Zettlemoyer. 2022. Language contamination helps explains the cross-lingual capabilities of English pretrained models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
  2. An open dataset and model for language identification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
  3. Breaking language barriers in multilingual mathematical reasoning: Insights and observations. arXiv preprint arXiv:2310.20246.
  4. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  5. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics (TACL).
  6. Not all languages are created equal in llms: Improving multilingual capability by cross-lingual-thought prompting. arXiv preprint arXiv:2305.07004.
  7. Mistral 7b.
  8. Mixtral of experts. arXiv preprint arXiv:2401.04088.
  9. Turning english-centric llms into polyglots: How much multilinguality is needed? arXiv preprint arXiv:2312.12683.
  10. Common sense beyond english: Evaluating and improving multilingual language models for commonsense reasoning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
  11. Tuning language models by proxy. arXiv preprint arXiv:2401.08565.
  12. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
  13. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of machine learning research (JMLR).
  14. Meta. 2024. Llama3. https://llama.meta.com/llama3/.
  15. Seallms–large language models for southeast asia. arXiv preprint arXiv:2312.00738.
  16. OpenAI. 2022. https://openai.com/blog/chatgpt.
  17. Cross-lingual prompting: Improving zero-shot chain-of-thought reasoning across languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
  18. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  19. Mapo: Advancing multilingual reasoning through multilingual alignment-as-preference optimization. arXiv preprint arXiv:2401.06838.
  20. Language models are multilingual chain-of-thought reasoners. In International Conference on Learning Representations (ICLR).
  21. Openmathinstruct-1: A 1.8 million math instruction tuning dataset. arXiv preprint arXiv:2402.10176.
  22. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  23. How far can camels go? exploring the state of instruction tuning on open resources. arXiv preprint arXiv:2306.04751.
  24. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  25. Do llamas work in english? on the latent language of multilingual transformers. arXiv preprint arXiv:2402.10588.
  26. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
  27. Langbridge: Multilingual reasoning without multilingual supervision. arXiv preprint arXiv:2401.10695.
  28. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
  29. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825.
  30. Mammoth: Building math generalist models through hybrid instruction tuning. arXiv preprint arXiv:2309.05653.
  31. Question translation training for better multilingual reasoning. arXiv preprint arXiv:2401.07817.
  32. Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Wenhao Zhu (32 papers)
  2. Shujian Huang (106 papers)
  3. Fei Yuan (28 papers)
  4. Cheng Chen (262 papers)
  5. Jiajun Chen (125 papers)
  6. Alexandra Birch (67 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets