Automatic Process Annotation for Enhancing Mathematical Reasoning in LLMs
Introduction
The challenge of accurately solving complex multi-step mathematical problems is a substantial one for current LLMs. Despite their impressive capabilities across various tasks, the nuanced and sequential nature of mathematical reasoning presents a distinctive challenge. Prior research has certainly made strides in this domain through methodologies spanning pre-training, fine-tuning, and verification, but the prospect of verification has recently taken center stage. Specifically, Process Reward Models (PRMs) have emerged as a promising avenue due to their capability to assess the reasoning path step-by-step, akin to the human process of problem-solving. However, the lack of automated processes for data annotation has remained a bottleneck. This paper introduces an innovative framework titled MATH-SHEPHERD, which leverages automatic process annotation to significantly reduce the dependency on manual data annotation, hence enhancing the LLMs' capability in mathematical reasoning.
Existing Limitations
The reliance on manual annotation for training PRMs is both cost-prohibitive and scaling-impaired, limiting the practical applicability and development pace of PRMs in mathematical reasoning tasks. Current verification models largely fall into two categories: Outcome Reward Models (ORMs) and PRMs. PRMs, despite their potential, have been hindered by the high costs and complexity of obtaining process-wise human annotations, especially for intricate multi-step reasoning tasks that demand advanced skills from annotators.
MATH-SHEPHERD Framework
MATH-SHEPHERD stands out by automating the annotation process, significantly enhancing the scalability and efficiency of training PRMs. Inspired by Monte Carlo Tree Search principles, it assesses the quality of each intermediate reasoning step based on its potential to deduce the correct final answer. This process involves an automatically fine-tuned LLM to generate multiple subsequent reasoning paths from a given step and validate them against the correct answer. Steps leading to accurate conclusions are thereby assigned higher correctness scores. The key contributions of the framework include:
- A method to automatically generate process supervision datasets for mathematical reasoning tasks without necessitating human annotations.
- Demonstrable superior performance across benchmark datasets GSM8K and MATH, using a series of open-source LLMs ranging in size from 7B to 70B parameters.
- Empirical analysis identifying the crucial factors in training an efficient verifier, thereby providing insights into future directions for enhancing reasoning capabilities in LLMs through intermediate supervision.
Dataset and Methodology
The framework was evaluated using two benchmark datasets: GSM8K and MATH. Leveraging automatically constructed process-wise supervision data, MATH-SHEPHERD facilitated the training of PRMs across a spectrum of model sizes (from 7B to 70B). Remarkably, DeepSeek 67B, when coupled with MATH-SHEPHERD, achieved unprecedented accuracy of 93.3% and 48.1% on the GSM8K and MATH datasets, respectively, without additional external aids.
Implications and Future Directions
MATH-SHEPHERD represents a significant stride towards resolving the limitations imposed by manual process annotation in mathematical reasoning tasks for LLMs. The framework not only demonstrates the viability of automatic process supervision as a scalable and efficient alternative but also paves the way for future research in amalgamating the capabilities of LLMs with advanced verification models like PRMs. Moreover, the remarkable improvement in performance across benchmark datasets underscores the potential of automated process annotation in elevating the reasoning capabilities of LLMs. Going forward, exploring the integration of such frameworks within reinforcement learning processes to further boost LLM accuracy in top-1 outcomes, along with the pursuit of a generalized PRM for mathematics, delineates an exciting trajectory for future research in the domain.