GRACE: Discriminator-Guided Chain-of-Thought Reasoning (2305.14934v2)

Published 24 May 2023 in cs.CL and cs.AI

Abstract: In the context of multi-step reasoning, e.g., with chain-of-thought, LLMs (LMs) can easily assign a high likelihood to incorrect steps. As a result, decoding strategies that optimize for solution likelihood often yield incorrect solutions. To address this issue, we propose Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise decoding approach that steers the decoding process towards producing correct reasoning steps. GRACE employs a discriminator trained with a contrastive loss over correct and incorrect steps, which is used during decoding to score next-step candidates based on their correctness. Importantly, GRACE only requires sampling from the LM, without the need for LM training or fine-tuning. Using models from FLAN-T5 and LLaMA families, we evaluate GRACE over four math and two symbolic reasoning tasks, where it exhibits substantial performance gains compared to greedy decoding, verifiers, and self-consistency in most settings. When further combined with self-consistency, GRACE outperforms all the baselines by sizeable margins. Human and LLM evaluations over GSM8K show that GRACE not only improves the final answer accuracy but also the correctness of the intermediate reasoning. Our implementation can be accessed at \url{https://github.com/mukhal/grace}.

PDF Abstract

An Analysis of GRACE: Discriminator-Guided Chain-of-Thought Reasoning

The paper "GRACE: Discriminator-Guided Chain-of-Thought Reasoning" tackles a pertinent challenge in the domain of multi-step reasoning for LLMs (LMs), where generating correct intermediate reasoning steps is crucial for arriving at accurate conclusions. Standard decoding strategies often falter as they may assign high probabilities to incorrect reasoning steps, resulting in erroneous final answers. The authors propose a novel stepwise decoding mechanism guided by a correctness discriminator, an approach designed to enhance logical reasoning accuracy by aligning the generation process closer to correct reasoning pathways.

Methodological Overview

The core of GRACE is a stepwise decoding framework that employs a correctness discriminator to assess and guide subsequent reasoning steps in multi-step tasks. The discriminator is trained using contrastive learning to distinguish between correct and incorrect reasoning steps. Notably, GRACE does not entail any fine-tuning of the underlying LM, instead, it leverages pre-existing capabilities of models in the FLAN-T5 and LLaMA families through stepwise guidance.

Key aspects of GRACE include:

Discriminator Training: The discriminator is honed via a three-step method involving sampling incorrect solutions, aligning them with correct reference solutions using a Needleman-Wunsch-like algorithm, and applying a max-margin loss for training. This method obviates the need for step-level human annotations by synthesizing training labels through solution alignment.
Guided Decoding: During inference, GRACE samples candidate reasoning steps, prioritizing those scored highly for correctness by the discriminator. The scoring combines LM-derived probabilities with discriminator evaluations to steer generation towards accurate reasoning.

Empirical Evaluation

The paper reports substantial empirical gains across six reasoning benchmarks, including GSM8K and MultiArith, among others. GRACE demonstrated superior performance over baseline methods such as greedy decoding, verifier models, and conventional self-consistency. Notably, the grace-guided decoding, when combined with self-consistency, outperformed all baselines by significant margins in several settings. Results on GSM8K highlighted enhancements where the accuracy improved by 7.4 and 5.4 percentage points using FLAN-T5-Large and LLaMA, respectively, compared to greedy decoding.

An intriguing observation is the discriminator's sample efficiency owing to step-level guidance, achieving competitive accuracy with fewer samples than conventional solution-level self-consistency approaches. The framework is shown to enhance not only the final answer accuracy but also the correctness of intermediate reasoning steps, reducing trace errors significantly as per human and LLM evaluations.

Theoretical and Practical Implications

The use of a correctness-guided discriminator introduces a nuanced layer of oversight to the decoding process in LMs, offering a principled approach to improve logical reasoning capabilities without demanding additional LM training. Theoretically, this highlights the potential of discriminator-guided techniques in structured reasoning tasks, proffering an avenue to exploit global knowledge encoded in LMs while maintaining local logical fidelity.

Practically, GRACE is significant as it enhances performance with relatively economical computational overheads, particularly in large LMs like LLaMA, where training or fine-tuning is resource-intensive. This methodology aligns with broader pursuits in artificial intelligence seeking modular, efficient approaches for empowering LMs in complex reasoning tasks.

Speculative Future Directions

Future explorations could address the discriminator's scalability across varied domains without requiring domain-specific reference solutions, potentially by leveraging zero-shot learning paradigms. Additionally, integrating this framework with more exhaustive exploratory methods like reinforcement learning-based approaches could yield even finer control over decoding dynamics, thus broadening the applicability and robustness of reasoning in diverse automated systems.

Overall, GRACE serves as an exemplar in advancing computational reasoning capabilities, underscoring discriminators' role in steering model outputs towards coherence and correctness in a resource-conscious manner.