SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models (2404.03887v4)
Abstract: This study presents a novel learning approach designed to enhance both mathematical reasoning and problem-solving abilities of LLMs. We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability. Thus, the initial learning with CoT is essential for solving challenging mathematical problems. To this end, we propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning. Our empirical study, involving an extensive performance comparison using several benchmarks, demonstrates that our SAAS achieves state-of-the-art (SOTA) performance. The results underscore the effectiveness of our sequential learning approach, marking a significant advancement in the field of mathematical reasoning in LLMs.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Mathqa: Towards interpretable math word problem solving with operation-based formalisms. arXiv preprint arXiv:1905.13319.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403.
- Anthropic. 2023. Model card and evaluations for claude models. URL https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf.
- Llemma: An open language model for mathematics. arXiv preprint arXiv:2310.10631.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
- Theoremqa: A theorem-driven question answering dataset. arXiv preprint arXiv:2305.12524.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- How abilities in large language models are affected by supervised fine-tuning data composition. arXiv preprint arXiv:2310.05492.
- A neural network solves and generates mathematics problems by program synthesis: Calculus, differential equations, linear algebra, and more. CoRR, abs/2112.15594.
- Pal: Program-aided language models. In International Conference on Machine Learning.
- Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR.
- Robert Glaser. 1984. Education and thinking: The role of knowledge. American psychologist.
- Tora: A tool-integrated reasoning agent for mathematical problem solving. arXiv preprint arXiv:2309.17452.
- Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874.
- Forward-backward reasoning in large language models for mathematical verification. arXiv preprint arXiv:2308.07758.
- Design of chain-of-thought in math problem solving. arXiv preprint arXiv:2309.11054.
- Solar 10.7 b: Scaling large language models with simple yet effective depth up-scaling. arXiv preprint arXiv:2312.15166.
- Mawps: A math word problem repository. In Proceedings of the 2016 conference of the north american chapter of the association for computational linguistics: human language technologies, pages 1152–1157.
- Platypus: Quick, cheap, and powerful refinement of llms. arXiv preprint arXiv:2308.07317.
- Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems, 35:3843–3857.
- Query and response augmentation cannot help out-of-domain math reasoning generalization. arXiv preprint arXiv:2310.05506.
- Camel: Communicative agents for" mind" exploration of large scale language model society. arXiv preprint arXiv:2303.17760.
- Mint: Boosting generalization in mathematical reasoning via multi-view fine-tuning. arXiv preprint arXiv:2307.07951.
- Let’s verify step by step. arXiv preprint arXiv:2305.20050.
- Program induction by rationale generation: Learning to solve and explain algebraic word problems. arXiv preprint arXiv:1705.04146.
- Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. arXiv preprint arXiv:2209.14610.
- A survey of deep learning for mathematical reasoning. arXiv preprint arXiv:2212.10535.
- Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
- Teaching small language models to reason. arXiv preprint arXiv:2212.08410.
- Jordan Meadows and André Freitas. 2022. A survey in mathematical language processing. arXiv preprint arXiv:2205.15231.
- A diverse corpus for evaluating and developing english math word problem solvers. arXiv preprint arXiv:2106.15772.
- Numglue: A suite of fundamental yet challenging mathematical reasoning tasks. arXiv preprint arXiv:2204.05660.
- OpenAI. 2023. Chat-gpt. URL https://openai.com/blog/chatgpt.
- Are nlp models really able to solve simple math word problems? arXiv preprint arXiv:2103.07191.
- Limitations of language models in arithmetic and symbolic induction. arXiv preprint arXiv:2208.05051.
- Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- Toolformer: Language models can teach themselves to use tools (2023). arXiv preprint arXiv:2302.04761.
- Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pages 31210–31227. PMLR.
- Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073.
- Representing numbers in nlp: a survey and a vision. arXiv preprint arXiv:2103.13136.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning. arXiv preprint arXiv:2310.03731.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems.
- Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712.
- Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
- Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825.
- Mammoth: Building math generalist models through hybrid instruction tuning. arXiv preprint arXiv:2309.05653.
- The gap of semantic parsing: A survey on automatic math word problem solvers. IEEE transactions on pattern analysis and machine intelligence.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Teaching algorithmic reasoning via in-context learning. arXiv preprint arXiv:2211.09066.
- Hyeonwoo Kim (13 papers)
- Gyoungjin Gim (3 papers)
- Yungi Kim (13 papers)
- Jihoo Kim (9 papers)
- Byungju Kim (7 papers)
- Wonseok Lee (13 papers)
- Chanjun Park (49 papers)