InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning (2402.06332v2)

Published 9 Feb 2024 in cs.CL

Abstract: The math abilities of LLMs can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at \url{https://github.com/InternLM/InternLM-Math}.

PDF Abstract

Enhancing Mathematical Reasoning in LLMs Through InternLM-Math

Introduction

InternLM-Math emerges as a notable advancement in the domain of mathematical reasoning LLMs, building on its predecessor InternLM2. This work seeks to significantly enhance the abstract reasoning capabilities of LLMs, especially in the mathematical domain, by introducing a series of novel methodologies and integrating them into a coherent training regimen. The resulting model demonstrates superior performance across a variety of mathematical reasoning benchmarks, establishing new state-of-the-art (SOTA) metrics in the field.

Advancements in Pre-Training and Fine-Tuning

The InternLM-Math model introduces an intricate pre-training phase, leveraging a diverse data corpus that includes common crawl data, domain-specific data, and synthetic data aimed at reinforcing the model's numerical operation capabilities. This pre-training strategy not only enriches the model's understanding but also its application of mathematical concepts in diverse contexts. The use of deduplication techniques and exact formulation decontamination further refines the quality of the training data, ensuring a high degree of relevance and accuracy in the model's learning process.

Subsequently, the supervised fine-tuning (SFT) phase of InternLM-Math's development focuses on a multidimensional enhancement of the model's capabilities. This phase incorporates chain-of-thought reasoning, code interpretation, and an innovative approach to augmenting mathematical problems. These facets collectively boost the model's ability to not only solve mathematical problems but also to generate new problems and verify the correctness of its solutions, thereby supporting a self-improving mechanism within LLMs for math reasoning.

Unification of Reasoning and Coding Abilities

A noteworthy innovation in InternLM-Math is the unification of reasoning and coding abilities under a unified seq2seq format, termed Reasoning Interleaved with Coding (RICO). This approach enables the model to interleave mathematical reasoning with coding sequences, offering a more natural and human-like problem-solving process. The integration of formal reasoning, through the use of the LEAN theorem prover, further distinguishes InternLM-Math by enabling it to tackle formal mathematical statements, bridging the gap between informal natural language reasoning and formal mathematical verification.

Leveraging Reward Modeling

Another significant aspect of InternLM-Math is its incorporation of reward modeling for improving the selection of reasoning paths and solutions. By employing both outcome reward models (ORM) and process reward models (PRM), InternLM-Math can more accurately identify and prioritize correct reasoning processes and solutions. This method not only enhances the model's performance on benchmark tasks but also aids in the generation of high-quality, verifiable data for self-improvement.

Practical Implications and Future Directions

InternLM-Math’s advancements present several practical implications for the field of AI-driven mathematical reasoning. Its ability to generate new mathematical problems and verify solutions opens up avenues for automated curriculum development and evaluation in educational contexts. Additionally, the model’s integration of formal reasoning capabilities suggests potential applications in automated theorem proving and formal verification, areas of significant importance in computer science and logic.

Looking forward, InternLM-Math sets the stage for future explorations into the uncharted territories of AI capabilities in mathematics. Its innovative methodologies and impressive performance lay a foundation for further research into self-improving systems, possibly leading to LLMs that can autonomously expand their knowledge and reasoning abilities across various domains of mathematics and beyond.

Conclusion

InternLM-Math represents a significant step forward in the pursuit of advanced mathematical reasoning abilities within LLMs. Through its comprehensive approach to pre-training and fine-tuning, along with the integration of code interpretation, formal reasoning, and reward modeling, InternLM-Math offers a glimpse into the future of AI-driven mathematics education, research, and application. The remarkable performance across multiple benchmarks underlines the efficacy of these innovations, paving the way for further advancements in the domain of AI and mathematical reasoning.

PDF Markdown Bookmark Chat (Pro)

Authors (22)

Huaiyuan Ying (11 papers)
Shuo Zhang (256 papers)
Linyang Li (57 papers)
Zhejian Zhou (6 papers)
Yunfan Shao (19 papers)
Zhaoye Fei (15 papers)
Yichuan Ma (7 papers)
Jiawei Hong (5 papers)
Kuikun Liu (12 papers)
Ziyi Wang (449 papers)
Yudong Wang (28 papers)
Zijian Wu (28 papers)
Shuaibin Li (4 papers)
Fengzhe Zhou (7 papers)
Songyang Zhang (116 papers)
Wenwei Zhang (77 papers)
Hang Yan (86 papers)
Xipeng Qiu (257 papers)
Jiayu Wang (30 papers)
Kai Chen (512 papers)

Citations (40)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/arankomatsuzaki/status/1756860164047618259

https://twitter.com/_akhaliq/status/1756866132093395136

https://twitter.com/AdeenaY8/status/1757084771824325038

https://twitter.com/xpasky/status/1874987418522443883

https://twitter.com/arxivsanitybot/status/1757034015914127574

https://twitter.com/tab_sebas/status/1781253128744063205