Re-Reading Improves Reasoning in Large Language Models (2309.06275v4)

Published 12 Sep 2023 in cs.CL

Abstract: To enhance the reasoning capabilities of off-the-shelf LLMs, we introduce a simple, yet general and effective prompting method, Re2, i.e., \textbf{Re}-\textbf{Re}ading the question as input. Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), which aim to elicit the reasoning process in the output, Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. Consequently, Re2 demonstrates strong generality and compatibility with most thought-eliciting prompting methods, including CoT. Crucially, Re2 facilitates a "bidirectional" encoding in unidirectional decoder-only LLMs because the first pass could provide global information for the second pass. We begin with a preliminary empirical study as the foundation of Re2, illustrating its potential to enable "bidirectional" attention mechanisms. We then evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality. Our findings indicate that, with the exception of a few scenarios on vanilla ChatGPT, Re2 consistently enhances the reasoning performance of LLMs through a simple re-reading strategy. Further analyses reveal Re2's adaptability, showing how it can be effectively integrated with different LLMs, thought-eliciting prompting, and ensemble strategies. Our code is available at \url{https://github.com/Tebmer/Rereading-LLM-Reasoning/}

PDF Abstract

Re-Reading Improves Reasoning in LLMs

The paper "Re-Reading Improves Reasoning in LLMs" explores the implications of an innovative prompting method called Re2, specifically designed to enhance the reasoning capabilities of off-the-shelf LLMs. This method involves re-reading the input question twice, diverging from the conventional methodologies like Chain-of-Thought (CoT) that usually emphasize the output phase.

The motivation behind Re2 is grounded in cognitive science findings which reveal that re-reading enhances comprehension. Inspired by human problem-solving strategies where questions are often re-read to improve understanding, Re2 seeks to replicate this cognitive mechanism in LLMs. The key advantage of Re2 is its generality and compatibility with most existing thought-eliciting prompting methods, such as CoT and Program-Aided LLM (PAL). It provides LLMs with a "bidirectional" encoding mechanism, where the first pass of the question offers global information that supports enhanced understanding in the second pass.

Methodology

The Re2 framework shifts focus from the traditional approach of eliciting reasoning processes in the output to strengthening input comprehension. The research employs a unified formulation for CoT prompting to solve reasoning tasks, augmented by a simple re-reading operation in Re2:

1 2	y \sim \sum_{z\sim~p(\|C_x)} p(\|C_x,z) \cdot p(z\|C_x), \ \text{where } C_x = \compose\nolimits^{(\tau)}(\rere(x))

Here, \rere(x) represents the re-reading operation of the input. An essential feature of Re2 is that it can be integrated seamlessly with various other promptings due to its simplicity. The re-reading method effectively increases computational resources allocated to input questions, akin to horizontally increasing the depth of neural networks, thus enhancing the LLM’s understanding.

Experimental Setup

The effectiveness of Re2 was evaluated through extensive experiments spanning 14 datasets across three main categories: arithmetic reasoning, commonsense reasoning, and symbolic reasoning. The benchmarks included well-known datasets like GSM8K, SVAMP, ASDiv, AQuA, CSQA, StrategyQA, ARC, Date Understanding, and Coin Flip.

Results

The numerical results support the claim that Re2 consistently improves reasoning performance across a variety of LLMs, prompting methods, and types of reasoning benchmarks. The results showed appreciable improvements in arithmetic reasoning datasets, with notable gains in GSM8K (CoT+Re2 up by 2.68%) and MultiArith (CoT+Re2 up by 4.00%).

In commonsense and symbolic reasoning tasks, Re2 exhibited robust performance enhancements as well. For instance, in the CSQA dataset, CoT+Re2 improved performance by 1.39% over CoT. Such consistent gains validate the hypothesis that re-reading strengthens input comprehension, thus improving LLM reasoning performance.

Implications and Future Work

The implications of this research are multifold. Practically, implementing Re2 in natural language processing pipelines can lead to more accurate and consistent reasoning outputs from LLMs, thereby enhancing their utility in real-world applications such as automated customer service, educational tools, and interactive AI systems. Theoretically, Re2 offers insights into the benefits of focusing on input comprehension as a means to improve reasoning performance, encouraging a shift from output-centric improvements.

Future research could expand on this foundation by exploring the nuances of re-reading frequency and its impact on LLM performance. Additionally, integrating Re2 with multi-turn dialogue systems and multi-modal reasoning applications might yield further advancements in AI capabilities.

In summation, the paper demonstrates through rigorous experimentation that the simple strategy of re-reading questions significantly bolsters reasoning performance in LLMs. This finding underscores the critical importance of input comprehension in the reasoning capabilities of AI models and opens avenues for further enhancing AI performance through cognitively inspired mechanisms.