Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations (2310.20246v5)

Published 31 Oct 2023 in cs.CL and cs.AI

Abstract: Existing research predominantly focuses on developing powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages, thus addressing the issue of training data scarcity in xMR tasks. Based on the collected dataset, we propose different training strategies to build powerful xMR LLMs, named MathOctopus, notably outperform conventional open-source LLMs and exhibit superiority over ChatGPT in few-shot scenarios. Notably, MathOctopus-13B reaches 47.6% accuracy which exceeds ChatGPT 46.3% on MGSM testset. Beyond remarkable results, we unearth several pivotal observations and insights from extensive experiments: (1) When extending the rejection sampling strategy to the multilingual context, it proves effective for model performances, albeit limited. (2) Employing parallel corpora for math Supervised Fine-Tuning (SFT) across multiple languages not only significantly enhances model performance multilingually but also elevates their monolingual performance. This indicates that crafting multilingual corpora can be regarded as a vital strategy for enhancing model performance in a specific language, especially in mathematical reasoning tasks. For instance, MathOctopus-7B improves its counterparts that trained on English from 42.2% to 50.8% on GSM8K testset. Codes are available at https://github.com/microsoft/MathOctopus.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Constitutional ai: Harmlessness from ai feedback, 2022.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020a.
  3. Language models are few-shot learners. CoRR, abs/2005.14165, 2020b.
  4. From good to best: Two-stage training for cross-lingual machine reading comprehension. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 10501–10508, 2022a.
  5. Bridging the gap between language models and cross-lingual sequence labeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1909–1923, 2022b.
  6. What would harry say? building dialogue agents for characters in a story. arXiv preprint arXiv:2211.06869, 2022c.
  7. Orca: A few-shot benchmark for chinese conversational machine reading comprehension. arXiv preprint arXiv:2302.13619, 2023a.
  8. Structural contrastive pretraining for cross-lingual comprehension. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2042–2057, Toronto, Canada, July 2023b. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.128. URL https://aclanthology.org/2023.findings-acl.128.
  9. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128, 2023c.
  10. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  11. Training verifiers to solve math word problems. CoRR, abs/2110.14168, 2021.
  12. Raft: Reward ranked finetuning for generative foundation model alignment, 2023.
  13. Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR, 2023.
  14. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  15. Large language models can self-improve, 2022.
  16. Mathprompter: Mathematical reasoning using large language models. In ACL (industry), pages 37–42. Association for Computational Linguistics, 2023.
  17. Unifiedqa: Crossing format boundaries with a single QA system. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pages 1896–1907. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.findings-emnlp.171. URL https://doi.org/10.18653/v1/2020.findings-emnlp.171.
  18. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, 2023.
  19. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023.
  20. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583, 2023.
  21. Learning math reasoning from self-sampled correct and partially-correct solutions. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=4D4TSJE6-K.
  22. OpenAI. Gpt-4 technical report, 2023.
  23. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.168. URL https://aclanthology.org/2021.naacl-main.168.
  24. Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=9Vrb9D0WI4.
  25. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  26. Language models are multilingual chain-of-thought reasoners. arXiv preprint arXiv:2210.03057, 2022.
  27. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492, 2023.
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  29. Llama 2: Open foundation and fine-tuned chat models, 2023b.
  30. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023c.
  31. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  32. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  33. Large language models are reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022.
  34. Zeroprompt: Scaling prompt-based pretraining to 1, 000 tasks improves zero-shot generalization. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 4235–4252. Association for Computational Linguistics, 2022. URL https://aclanthology.org/2022.findings-emnlp.312.
  35. End-to-end spoken conversational question answering: Task, dataset and model. arXiv preprint arXiv:2204.14272, 2022.
  36. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825, 2023a.
  37. Rrhf: Rank responses to align language models with human feedback without tears, 2023b.
  38. STar: Bootstrapping reasoning with reasoning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=_3ELRdg2sgI.
  39. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022.
  40. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
  41. Solving math word problems via cooperative reasoning induced language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4471–4485, Toronto, Canada, July 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.acl-long.245.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Nuo Chen (100 papers)
  2. Zinan Zheng (5 papers)
  3. Ning Wu (62 papers)
  4. Ming Gong (246 papers)
  5. Dongmei Zhang (193 papers)
  6. Jia Li (380 papers)
Citations (29)