Large Language Models are Zero-Shot Reasoners (2205.11916v4)

Published 24 May 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Pretrained LLMs are widely used in many sub-fields of NLP and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding "Let's think step by step" before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with large InstructGPT model (text-davinci-002), as well as similar magnitudes of improvements with another off-the-shelf large model, 540B parameter PaLM. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted by simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.

PDF Abstract

LLMs are Zero-Shot Reasoners

The paper "LLMs are Zero-Shot Reasoners" presents an intriguing examination of the reasoning capabilities of LLMs beyond the traditional few-shot learning paradigm. Specifically, it introduces an innovative approach termed Zero-shot Chain of Thought (Zero-shot-CoT), designed to elicit step-by-step reasoning in order to significantly enhance the performance of LLMs across a broad array of complex reasoning tasks.

Introduction and Motivation

The efficacy of LLMs in few-shot learning scenarios has been well-documented, especially with the advent of techniques such as Chain of Thought (CoT) prompting. However, this paper challenges the prevailing view by demonstrating that LLMs can be proficient zero-shot reasoners as well. The key innovation is the addition of the prompt "Let's think step by step" preceding the answer, which activates the model's inherent multi-step reasoning capabilities without the need for task-specific examples.

Methodology

The Zero-shot-CoT approach is straightforward yet powerful. It involves two stages of prompting:

Reasoning Extraction: The initial prompt modifies the input question by appending "Let's think step by step" to guide the model in generating a logical sequence of thought leading to the answer.
Answer Extraction: The output from the first stage is then re-prompted to derive the final answer in the correct format.

This method eschews the need for elaborate few-shot examples or domain-specific prompt engineering, making it versatile and broadly applicable.

Experimental Evaluation

The authors conducted extensive evaluations using benchmark datasets across several reasoning categories: arithmetic, commonsense, symbolic, and other logical tasks. Notable results include substantial improvements in arithmetic reasoning performance, such as a jump from 17.7% to 78.7% accuracy on the MultiArith dataset and from 10.4% to 40.7% on GSM8K using the InstructGPT model (text-davinci-002).

Moreover, similar magnitudes of improvement were observed with the PaLM model, affirming the robustness of Zero-shot-CoT. The paper underscores that the reasoning abilities of LLMs, previously thought to be limited to few-shot contexts, are also effective in zero-shot settings.

Comparative Analysis

Zero-shot-CoT was benchmarked against standard zero-shot and few-shot prompting methods. While it naturally underperforms compared to few-shot CoT with carefully engineered examples, Zero-shot-CoT surpasses standard few-shot prompting notably. Additionally, the introduction of self-consistency, wherein multiple reasoning paths are generated and the final answer is decided via majority voting, further bolsters the performance metrics.

Implications and Future Directions

The implications of this research are multifaceted. On a practical level, Zero-shot-CoT provides a minimalist yet potent baseline for zero-shot reasoning tasks, streamlining prompt design and reducing reliance on curated examples. Theoretically, it opens new avenues for exploring latent cognitive abilities within LLMs that extend beyond narrow task-specific skills towards broader generalization abilities.

Future research could explore discovering other multi-task prompts that can unlock hidden high-level reasoning capabilities in LLMs. Additionally, refining the Zero-shot-CoT method to automatically generate optimal prompts presents an exciting challenge.

Conclusion

The paper "LLMs are Zero-Shot Reasoners" makes a compelling case that LLMs possess considerable zero-shot reasoning capabilities that can be harnessed through simple yet effective prompting strategies. The Zero-shot-CoT method not only serves as a formidable zero-shot baseline but also invites the broader research community to rethink and explore the extensive, untapped potential of LLMs in multi-step reasoning tasks.