Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
107 tokens/sec
Gemini 2.5 Pro Premium
58 tokens/sec
GPT-5 Medium
29 tokens/sec
GPT-5 High Premium
25 tokens/sec
GPT-4o
101 tokens/sec
DeepSeek R1 via Azure Premium
84 tokens/sec
GPT OSS 120B via Groq Premium
478 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling (2412.15084v2)

Published 19 Dec 2024 in cs.CL, cs.AI, and cs.LG

Abstract: In this paper, we introduce AceMath, a suite of frontier math models that excel in solving complex math problems, along with highly effective reward models capable of evaluating generated solutions and reliably identifying the correct ones. To develop the instruction-tuned math models, we propose a supervised fine-tuning (SFT) process that first achieves competitive performance across general domains, followed by targeted fine-tuning for the math domain using a carefully curated set of prompts and synthetically generated responses. The resulting model, AceMath-72B-Instruct greatly outperforms Qwen2.5-Math-72B-Instruct, GPT-4o and Claude-3.5 Sonnet. To develop math-specialized reward model, we first construct AceMath-RewardBench, a comprehensive and robust benchmark for evaluating math reward models across diverse problems and difficulty levels. After that, we present a systematic approach to build our math reward models. The resulting model, AceMath-72B-RM, consistently outperforms state-of-the-art reward models. Furthermore, when combining AceMath-72B-Instruct with AceMath-72B-RM, we achieve the highest average rm@8 score across the math reasoning benchmarks. We release model weights, training data, and evaluation benchmarks at: https://research.nvidia.com/labs/adlr/acemath

Summary

  • The paper introduces AceMath, a suite of frontier mathematical language models developed using a two-phase post-training approach and advanced reward modeling techniques.
  • AceMath-72B-Instruct achieves state-of-the-art performance, significantly surpassing existing models like GPT-4o and Claude-3.5 Sonnet on math reasoning benchmarks.
  • The open-sourcing of AceMath's weights and data aims to democratize access to advanced math reasoning capabilities and foster future AI research.

A Detailed Examination of "AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling"

The paper "AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling" presents the development and evaluation of a suite of frontier-class mathematical LLMs, collectively referred to as AceMath. These models are designed to excel in solving complex mathematical problems and accurately evaluating generated solutions. The work is marked by the introduction of the AceMath-72B-Instruct model, which significantly surpasses existing state-of-the-art models like Qwen2.5-Math-72B-Instruct, GPT-4o, and Claude-3.5 Sonnet in terms of performance on mathematical reasoning benchmarks.

Methodology and Development

The authors employed a two-phase post-training approach to improve math reasoning capabilities. The first phase involved supervised fine-tuning (SFT) to create a strong baseline across general domains before targeting the mathematical domain. This was achieved by first training models on a broad array of tasks, including multidisciplinary topics and coding, and then fine-tuning them with a curated set of math-specific prompts and synthetically generated responses. This ensured that the models not only followed instructions effectively but also specialized in mathematical reasoning.

For reward modeling, the authors constructed AceMath-RewardBench, a robust benchmark environment to evaluate math reward models. The reward model, AceMath-72B-RM, was developed using a systematic approach focused on data collection and synthetic data generation, which outperformed existing reward models, indicating its superior capability in assessing mathematical solutions.

Experimental Evaluation

Evaluating the AceMath models across various benchmarks, the paper highlights that AceMath-72B-Instruct consistently achieves superior performance compared to its predecessors and contemporaries. The inclusion of both the instruct and reward models results in the highest average rm@8 score across a broad spectrum of math reasoning tasks. Notably, the findings emphasize the efficacy of using highly targeted fine-tuning procedures alongside advanced reward systems to achieve breakthroughs in LLM mathematical reasoning.

Implications and Future Directions

The practical implications of this research are profound for developing LLMs that can effectively interface with domains requiring mathematical proficiency, offering potential advancements in educational tools, automated theorem proving, and scientific computing. Theoretically, the coupling of precise SFT with robust reward modeling provides new insights into efficient model training approaches that balance generalization with specialization.

Looking forward, the open-sourcing of AceMath's model weights and training data stands to democratize access to these advanced capabilities, fostering further research and development in the AI community. Moreover, the insights gleaned from AceMath could serve as a basis for subsequent generations of general-purpose LLMs, poised to tackle increasingly complex and specialized tasks.

In summary, the AceMath paper showcases a significant advancement in the field of mathematical LLMs, underscored by the innovative integration of post-training and reward modeling. The developmental process and resultant models set a new standard for performance and adaptability in this specialized domain, prompting new possibilities for future AI research and application.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube