Large Language Models are Better Reasoners with Self-Verification (2212.09561v5)

Published 19 Dec 2022 in cs.AI and cs.CL

Abstract: Recently, with the chain of thought (CoT) prompting, LLMs, e.g., GPT-3, have shown strong reasoning ability in several natural language processing tasks such as arithmetic, commonsense, and logical reasoning. However, LLMs with CoT require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes and vulnerable to error accumulation. The above issues make the LLMs need the ability to verify the answers. In fact, after inferring conclusions in some thinking decision tasks, people often check them by re-verifying steps to avoid some mistakes. In this paper, we propose and prove that LLMs also have similar self-verification abilities. We take the conclusion obtained by CoT as one of the conditions for solving the original problem. By performing a backward verification of the answers that LLM deduced for itself, we can obtain interpretable answer validation scores to select the candidate answer with the highest score. Experimental results demonstrate that the proposed method can improve the reasoning performance on various arithmetic, commonsense, and logical reasoning datasets. Our code is publicly available at: https://github.com/WENGSYX/Self-Verification.

PDF Abstract

Analysis of "LLMs are Better Reasoners with Self-Verification"

The paper "LLMs are Better Reasoners with Self-Verification" by Yixuan Weng et al. presents a novel methodology designed to enhance the reasoning capabilities of LLMs using self-verification. Building upon the traditional chain of thought (CoT) prompting, the authors identify a critical limitation inherent in LLMs—namely, their susceptibility to error propagation in multi-step reasoning processes. The paper addresses this by introducing a self-verification mechanism that allows the model to autonomously verify its predictions, thereby mitigating propagation of errors.

Methodological Overview

The authors propose a two-step process: Forward Reasoning and Backward Verification. Initially, LLMs are prompted via CoT to generate multiple candidate solutions to a problem, employing sampling decoding to ensure diversity in responses. Subsequently, a backward verification module is applied, wherein conclusions are rewritten into statements and conditions, and consistency checks are performed with the original problem. Two verification techniques are employed: True-False Item Verification for general reasoning tasks and Condition Mask Verification tailored for arithmetic reasoning challenges.

The experimental setup leverages a range of datasets including mathematical reasoning (e.g., GSM8K, SingleEq), commonsense reasoning (CSQA), and logical reasoning (Date Understanding), providing a comprehensive evaluation of the self-verification mechanism. The results indicate an improvement in problem-solving accuracy across these diverse datasets when self-verification is integrated into the reasoning process.

Experimental Insights

The numerical outcomes from the paper underscore the effectiveness of the proposed methodology. Notably, on mathematical reasoning tasks such as GSM8K, integration of self-verification results in a performance jump from 60.8% to 65.1%, highlighting the efficacy of the verification framework in enhancing CoT results. Similar improvements are observed across other datasets, fostering a robust case for self-verification as a viable extension to LLM reasoning capabilities.

Furthermore, the paper explores the synergy between self-verification and advancements in forward reasoning methodologies such as Self-Consistency Decoding and PAL (Program-Aided LLMs). The integration of forward reasoning improvements with self-verification continues to yield notable performance enhancements, confirming the versatility and adaptability of the self-verification approach.

Theoretical and Practical Implications

The theoretical contributions of this paper lie in demonstrating that LLMs possess inherent self-verification capabilities that can be operationalized to reduce reasoning errors. Practically, the methodology presents a pathway to improving the reliability of model outputs without the overhead of additional fine-tuning or data annotation.

The approach suggests promising avenues for future exploration, particularly regarding its applicability to other reasoning tasks and its potential to scaffold areas such as explainability and interpretability in model outputs. The paper also stresses the importance of leveraging larger-scale models to maximize the benefits of self-verification, hinting at scaling laws that may better inform the deployment of LLMs in reasoning-intensive applications.

Conclusion

Weng et al.'s research substantiates the hypothesis that self-verification can substantially bolster LLM reasoning, depicting an innovative step towards achieving more accurate and reliable AI systems. The successful application of self-verification across diverse reasoning domains suggests significant potential for this methodology to become a new standard in enhancing AI cognitive abilities. As advancements in LLMs continue to proliferate, integrating self-verification may well be pivotal in evolving these systems into more autonomous and error-resilient agents.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Yixuan Weng (28 papers)
Minjun Zhu (11 papers)
Fei Xia (111 papers)
Bin Li (514 papers)
Shizhu He (51 papers)
Shengping Liu (21 papers)
Bin Sun (74 papers)
Kang Liu (207 papers)
Jun Zhao (469 papers)

Citations (135)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - WENGSYX/Self-Verification: We have released the code and demo program required for LLM with self-verification (45 stars)

Tweets

https://twitter.com/hubert_misztela/status/1784701573374828819