Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

107 tokens/sec

Gemini 2.5 Pro Premium

58 tokens/sec

GPT-5 Medium

29 tokens/sec

GPT-5 High Premium

25 tokens/sec

GPT-4o

101 tokens/sec

DeepSeek R1 via Azure Premium

84 tokens/sec

GPT OSS 120B via Groq Premium

478 tokens/sec

Kimi K2 via Groq Premium

213 tokens/sec

2000 character limit reached

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling (2412.15084v2)

Published 19 Dec 2024 in cs.CL, cs.AI, and cs.LG

Abstract: In this paper, we introduce AceMath, a suite of frontier math models that excel in solving complex math problems, along with highly effective reward models capable of evaluating generated solutions and reliably identifying the correct ones. To develop the instruction-tuned math models, we propose a supervised fine-tuning (SFT) process that first achieves competitive performance across general domains, followed by targeted fine-tuning for the math domain using a carefully curated set of prompts and synthetically generated responses. The resulting model, AceMath-72B-Instruct greatly outperforms Qwen2.5-Math-72B-Instruct, GPT-4o and Claude-3.5 Sonnet. To develop math-specialized reward model, we first construct AceMath-RewardBench, a comprehensive and robust benchmark for evaluating math reward models across diverse problems and difficulty levels. After that, we present a systematic approach to build our math reward models. The resulting model, AceMath-72B-RM, consistently outperforms state-of-the-art reward models. Furthermore, when combining AceMath-72B-Instruct with AceMath-72B-RM, we achieve the highest average rm@8 score across the math reasoning benchmarks. We release model weights, training data, and evaluation benchmarks at: https://research.nvidia.com/labs/adlr/acemath

Summary

The paper introduces AceMath, a suite of frontier mathematical language models developed using a two-phase post-training approach and advanced reward modeling techniques.
AceMath-72B-Instruct achieves state-of-the-art performance, significantly surpassing existing models like GPT-4o and Claude-3.5 Sonnet on math reasoning benchmarks.
The open-sourcing of AceMath's weights and data aims to democratize access to advanced math reasoning capabilities and foster future AI research.

A Detailed Examination of "AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling"

The paper "AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling" presents the development and evaluation of a suite of frontier-class mathematical LLMs, collectively referred to as AceMath. These models are designed to excel in solving complex mathematical problems and accurately evaluating generated solutions. The work is marked by the introduction of the AceMath-72B-Instruct model, which significantly surpasses existing state-of-the-art models like Qwen2.5-Math-72B-Instruct, GPT-4o, and Claude-3.5 Sonnet in terms of performance on mathematical reasoning benchmarks.

Methodology and Development

The authors employed a two-phase post-training approach to improve math reasoning capabilities. The first phase involved supervised fine-tuning (SFT) to create a strong baseline across general domains before targeting the mathematical domain. This was achieved by first training models on a broad array of tasks, including multidisciplinary topics and coding, and then fine-tuning them with a curated set of math-specific prompts and synthetically generated responses. This ensured that the models not only followed instructions effectively but also specialized in mathematical reasoning.

For reward modeling, the authors constructed AceMath-RewardBench, a robust benchmark environment to evaluate math reward models. The reward model, AceMath-72B-RM, was developed using a systematic approach focused on data collection and synthetic data generation, which outperformed existing reward models, indicating its superior capability in assessing mathematical solutions.

Experimental Evaluation

Evaluating the AceMath models across various benchmarks, the paper highlights that AceMath-72B-Instruct consistently achieves superior performance compared to its predecessors and contemporaries. The inclusion of both the instruct and reward models results in the highest average rm@8 score across a broad spectrum of math reasoning tasks. Notably, the findings emphasize the efficacy of using highly targeted fine-tuning procedures alongside advanced reward systems to achieve breakthroughs in LLM mathematical reasoning.

Implications and Future Directions

The practical implications of this research are profound for developing LLMs that can effectively interface with domains requiring mathematical proficiency, offering potential advancements in educational tools, automated theorem proving, and scientific computing. Theoretically, the coupling of precise SFT with robust reward modeling provides new insights into efficient model training approaches that balance generalization with specialization.

Looking forward, the open-sourcing of AceMath's model weights and training data stands to democratize access to these advanced capabilities, fostering further research and development in the AI community. Moreover, the insights gleaned from AceMath could serve as a basis for subsequent generations of general-purpose LLMs, poised to tackle increasingly complex and specialized tasks.

In summary, the AceMath paper showcases a significant advancement in the field of mathematical LLMs, underscored by the innovative integration of post-training and reward modeling. The developmental process and resultant models set a new standard for performance and adaptability in this specialized domain, prompting new possibilities for future AI research and application.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (5)

Tweets

https://twitter.com/ychenNLP/status/1880336603069259951

https://twitter.com/rohanpaul_ai/status/1877148448324977027

https://twitter.com/arXivGPT/status/1870893425467306027

https://twitter.com/GptMaestro/status/1873672261632430576