Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning (2410.10735v1)

Published 14 Oct 2024 in cs.AI and cs.CL

Abstract: Accurate mathematical reasoning with LLMs is crucial in revolutionizing domains that heavily rely on such reasoning. However, LLMs often encounter difficulties in certain aspects of mathematical reasoning, leading to flawed reasoning and erroneous results. To mitigate these issues, we introduce a novel mechanism, the Chain of Self-Correction (CoSC), specifically designed to embed self-correction as an inherent ability in LLMs, enabling them to validate and rectify their own results. The CoSC mechanism operates through a sequence of self-correction stages. In each stage, the LLMs generate a program to address a given problem, execute this program using program-based tools to obtain an output, subsequently verify this output. Based on the verification, the LLMs either proceed to the next correction stage or finalize the answer. This iterative self-correction process allows the LLMs to refine their reasoning steps and improve the accuracy of their mathematical reasoning. To enable the CoSC mechanism at a low cost, we employ a two-phase finetuning approach. In the first phase, the LLMs are trained with a relatively small volume of seeding data generated from GPT-4, establishing an initial CoSC capability. In the second phase, the CoSC capability is further enhanced by training with a larger volume of self-generated data using the trained model in the first phase, without relying on the paid GPT-4. Our comprehensive experiments demonstrate that CoSC significantly improves performance on traditional mathematical datasets among existing open-source LLMs. Notably, our CoSC-Code-34B model achieved a 53.5% score on MATH, the most challenging mathematical reasoning dataset in the public domain, surpassing the performance of well-established models such as ChatGPT, GPT-4, and even multi-modal LLMs like GPT-4V, Gemini-1.0 Pro, and Gemini-1.0 Ultra.

PDF HTML Abstract

Overview of the Chain of Self-Correction Mechanism in LLMs

The paper, "Embedding Self-Correction as an Inherent Ability in LLMs for Enhanced Mathematical Reasoning," introduces the Chain of Self-Correction (CoSC) mechanism to significantly enhance the mathematical reasoning capabilities of LLMs. The researchers focus on addressing the inherent weaknesses of LLMs in executing accurate mathematical reasoning by integrating a self-correction mechanism.

The primary challenge faced by LLMs in mathematical problem-solving is their tendency to produce erroneous outputs due to flawed reasoning processes. The CoSC mechanism aims to mitigate these issues by embedding a self-correction process during the reasoning stages. This mechanism consists of multiple iterative stages, wherein the model generates a solving program, executes it, and subsequently verifies the output to either proceed further or finalize the answer. This approach allows for iterative refinement of reasoning steps, thereby improving overall accuracy.

Methodology

The implementation of CoSC involves a two-phase finetuning process. The foundational learning phase uses a limited volume of seeding data derived from GPT-4, enabling an initial self-correction capability. The subsequent self-enhancement phase, however, leverages the model's own output for further training, thus enhancing CoSC without incurring GPT-4 data generation costs. This innovative approach facilitates the training of LLMs with intrinsic self-correction capabilities at a reduced expense.

In the foundational phase, the model is exposed to training datasets such as MATH and GSM8K through a structured procedure, eliciting CoSC-compatible annotations from GPT-4. The response trajectory consists of generating solution programs, executing them, verifying results, and drawing conclusions for further iterations if required. This exhaustive annotation method results in a substantial collection of correctly solved problems, forming a robust training dataset.

The self-enhancement phase capitalizes on the foundational model's ability to self-correct by generating extensive self-labeled data. This process involves dense solution sampling and question sampling, expanding the variety and volume of training data without external intervention.

Results

The experimental analysis presented in the paper demonstrates that CoSC substantially elevates mathematical reasoning performance on the MATH and GSM8K datasets. For instance, the CoSC-Code-34B model surpasses well-established models such as GPT-4, ChatGPT, and even some multimodal LLMs in handling complex mathematical reasoning tasks. With these results, the CoSC approach becomes distinguished by its zero-shot inference capacity, which does not rely on external prompts or demonstrations.

Implications and Future Research

The integration of self-correction into LLMs offers promising avenues for improving reasoning mechanisms across various domains, not limited to mathematics. By embedding the ability to verify and rectify their own outputs, LLMs can emulate a more human-like and nuanced approach to problem-solving. The research illustrates a significant step forward in fine-tuning methodologies, suggesting potential for wider application across AI-driven reasoning tasks.

This paper opens up several paths for future research, including exploration of CoSC's applicability to other reasoning-heavy domains, and further optimization of the iterative correction process. The release of the CoSC method's code and data also invites open-source collaboration, fostering further innovation and improvement in AI’s reasoning capabilities. As LLMs continue to evolve with enhanced reasoning faculties, their utility across more sophisticated problem sets will invariably expand, offering more robust AI solutions for complex tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Kuofeng Gao (23 papers)
Huanqia Cai (6 papers)
Qingyao Shuai (3 papers)
Dihong Gong (14 papers)
Zhifeng Li (74 papers)

Citations (1)

View on Semantic Scholar

Embedding Self-Correction as an Inherent Ability in Large Language Models for Enhanced Mathematical Reasoning (2410.10735v1)

Overview of the Chain of Self-Correction Mechanism in LLMs

Methodology

Results

Implications and Future Research

Related Papers