Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning (2403.00799v1)

Published 23 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs are displaying emergent abilities for math reasoning tasks,and there is a growing attention on enhancing the ability of open-source LLMs through supervised fine-tuning (SFT).In this paper, we aim to explore a general data strategy for supervised data to help optimize and expand math reasoning ability.Firstly, we determine the ability boundary of reasoning paths augmentation by identifying these paths' minimal optimal set.Secondly, we validate that different abilities of the model can be cumulatively enhanced by Mix of Minimal Optimal Sets of corresponding types of data, while our models MMOS achieve SOTA performance on series base models under much lower construction costs.Besides, we point out GSM-HARD is not really hard and today's LLMs no longer lack numerical robustness.Also, we provide an Auto Problem Generator for robustness testing and educational applications.Our code and data are publicly available at https://github.com/cyzhh/MMOS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Language Models are Few-Shot Learners.
  2. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks.
  3. Meta-learning via Language Model In-context Tuning.
  4. Training Verifiers to Solve Math Word Problems.
  5. PAL: Program-aided Language Models.
  6. ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving.
  7. Measuring Mathematical Problem Solving With the MATH Dataset.
  8. The Curious Case of Neural Text Degeneration.
  9. Learning to Automatically Solve Algebra Word Problems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 271–281, Baltimore, Maryland. Association for Computational Linguistics.
  10. Query and response augmentation cannot help out-of-domain math reasoning generalization.
  11. A Survey on Deep Learning for Named Entity Recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1):50–70.
  12. A survey of deep learning for mathematical reasoning.
  13. WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct.
  14. A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 975–984, Online. Association for Computational Linguistics.
  15. Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions.
  16. OpenAI. 2023. GPT-4 Technical Report.
  17. Are NLP Models really able to Solve Simple Math Word Problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094, Online. Association for Computational Linguistics.
  18. Code Llama: Open Foundation Models for Code.
  19. LLaMA: Open and Efficient Foundation Language Models.
  20. Llama 2: Open Foundation and Fine-Tuned Chat Models.
  21. Shyam Upadhyay and Ming-Wei Chang. 2017. Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 494–504, Valencia, Spain. Association for Computational Linguistics.
  22. MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning.
  23. Self-Consistency Improves Chain of Thought Reasoning in Language Models.
  24. Emergent Abilities of Large Language Models.
  25. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
  26. Self-evaluation guided beam search for reasoning.
  27. Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of NLP Systems.
  28. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models.
  29. Scaling Relationship on Learning Mathematical Reasoning with Large Language Models.
  30. MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning.
  31. Star: Bootstrapping reasoning with reasoning.
  32. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.
  33. Solving Math Word Problems via Cooperative Reasoning induced Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4471–4485, Toronto, Canada. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zui Chen (14 papers)
  2. Yezeng Chen (5 papers)
  3. Jiaqi Han (24 papers)
  4. Zhijie Huang (19 papers)
  5. Ji Qi (61 papers)
  6. Yi Zhou (438 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.