LLM2: Let Large Language Models Harness System 2 Reasoning (2412.20372v2)

Published 29 Dec 2024 in cs.CL and cs.AI

Abstract: LLMs have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs. We posit that these limitations are rooted in the foundational autoregressive architecture of LLMs, which inherently lacks mechanisms for differentiating between desirable and undesirable results. Drawing inspiration from the dual-process theory of human cognition, we introduce LLM2, a novel framework that combines an LLM (System 1) with a process-based verifier (System 2). Within LLM2, the LLM is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs. The verifier is trained with a pairwise comparison loss on synthetic process-supervision data generated through our token quality exploration strategy. Empirical results on mathematical reasoning benchmarks substantiate the efficacy of LLM2, exemplified by an accuracy enhancement from 50.3 to 57.8 (+7.5) for Llama3-1B on GSM8K. Furthermore, when combined with self-consistency, LLM2 achieves additional improvements, boosting major@20 accuracy from 56.2 to 70.2 (+14.0).

Summary

The paper proposes LLM2, a novel framework combining traditional LLMs with a System 2-like process-based verifier to enhance reasoning capabilities by allowing models to verify outputs during generation.
Empirical evaluations demonstrate LLM2 improves accuracy on mathematical reasoning benchmarks like GSM8K by up to 7.5 percentage points across different Llama3 models.
This dual-process approach offers a scalable method to integrate analytical reasoning into LLMs, potentially applicable to various tasks beyond mathematics requiring output validation.

Overview of LLM2: Let LLMs Harness System 2 Reasoning

The paper "LLM2: Let LLMs Harness System 2 Reasoning" introduces a significant advancement in the architecture of LLMs by integrating System 2 reasoning capabilities. Traditional LLMs, although performing impressively across a wide range of tasks, are susceptible to generating undesirable outputs. The authors argue that this limitation stems from the autoregressive architecture inherent in LLMs, which lacks the capacity to discern between desirable and undesirable results. Inspired by the dual-process theory of human cognition, which delineates between intuitive (System 1) and analytical (System 2) reasoning, this research proposes a novel framework, named LLM2. This framework amalgamates the classic LLM architecture (System 1) with a process-based verifier (System 2) to improve reasoning and output quality.

Key Contributions

The essential contribution of this paper lies in the introduction of a dual-process architectural framework that attempts to correct the deficiencies of LLMs by engaging System 2-like analytical reasoning during the generation process. The process-based verifier, trained using a pairwise comparison loss on synthetic supervision data, provides feedback to the LLM, thus enhancing decision-making capabilities at each generation step.

Token Quality Exploration Strategy: The authors propose a token quality exploration strategy for generating synthetic data to train the verifier. This strategy involves generating continuations for candidate tokens, assessing their quality based on predefined metrics, and selecting undesirable tokens for training the verifier.
Empirical Validation: The proposed approach, LLM2, was empirically evaluated on mathematical reasoning benchmarks, GSM8K and MATH, using a variety of models from the Llama3 series. The results were notable, showcasing accuracy improvements of up to 7.5 percentage points on GSM8K.
Integration with Self-Consistency: LLM2 demonstrated further improvements when integrated with self-consistency techniques, yielding a substantial increase in major@20 accuracy.

Numerical Results and Implications

The empirical evaluation on GSM8K and MATH datasets clearly demonstrates the efficacy of LLM2 in enhancing the reasoning capability of LLMs. Specifically, the integration of System 2 reasoning through the process-based verifier resulted in a marked increase in performance across models of varying sizes. This not only reinforces the validity of the dual-process architecture but also highlights its potential scalability.

The practical implications of this research are significant. By integrating a verifier to provide oversight during the generation process, the LLM2 framework can potentially be applied to a wider array of reasoning tasks beyond mathematical problem-solving. The methodology also opens doors to further exploration in domains requiring stringent validation of outputs, such as legal text generation or scientific literature summarization.

Theoretical Implications

From a theoretical perspective, this paper contributes to the cognitive modeling of artificial intelligence systems. By drawing parallels between human cognitive processes and machine learning architectures, it offers a new paradigm for designing AI systems that are both intuitive and analytical. This dual-process framework could serve as a basis for future research aimed at developing AI systems capable of more human-like reasoning.

Future Directions

The research outlined in this paper suggests several fruitful directions for future exploration:

Exploration Beyond Mathematics: While initial results are promising within mathematical reasoning, applying LLM2 in other domains, such as commonsense reasoning or strategic planning, could validate its broader applicability.
Enhancing the Verifier: Further refinement of the process-based verifier, potentially through more sophisticated machine learning techniques or data-driven insights, could unlock greater performance improvements.
Exploration of Different Models: Experimentation with different architectures or hyperparameters may uncover even more effective configurations for specific applications.

In summary, the "LLM2" framework proposed by the authors offers a noteworthy advancement in the field of LLM reasoning. Through the integration of dual-process reasoning, it presents a comprehensive method for overcoming the challenges faced by traditional LLM architectures, with broad implications for the field of AI and cognitive modeling.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1874223442431815764