Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models (2305.04091v3)

Published 6 May 2023 in cs.CL

Abstract: LLMs have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.

Citations (230)

View on Semantic Scholar

Summary

The paper introduces a novel Plan-and-Solve prompting framework that first decomposes tasks into planning and solving phases to enhance reasoning accuracy.
The methodology achieves accuracy improvements of 2% to over 5% in arithmetic, commonsense, and symbolic reasoning compared to traditional Zero-shot-CoT.
The findings demonstrate that structured task breakdowns can elevate large language model performance to levels comparable with few-shot approaches.

Plan-and-Solve Prompting: Enhancing Zero-Shot Chain-of-Thought Reasoning in LLMs

Introduction

The efficacy of LLMs in NLP tasks has been well-documented, with models such as GPT-3 demonstrating significant prowess in a range of applications. Chain-of-Thought (CoT) prompting has further enhanced performance in multi-step reasoning tasks by enabling LLMs to generate intermediate reasoning steps, typically requiring manually crafted few-shot examples. Zero-Shot Chain-of-Thought (Zero-shot-CoT) is a simplification where the prompt "Let's think step by step" is appended to the target problem, leading to surprisingly robust performance levels even without manually curated examples. Despite these advances, Zero-shot-CoT experiences pitfalls, including calculation errors, missing-step errors, and semantic misunderstandings.

Plan-and-Solve Prompting Approach

To address the weaknesses of Zero-shot-CoT, Wang et al. propose the Plan-and-Solve (PS) Prompting framework. PS prompting consists of two components:

Plan Phase: The model first devises a plan to break down the entire task into smaller subtasks.
Solve Phase: The model then executes the subtasks according to the devised plan.

The framework extends to PS+ prompting, which includes more detailed instructions aimed at addressing calculation errors and improving the quality of generated reasoning steps. The effectiveness of these approaches was evaluated across various datasets and reasoning problems using GPT-3.

Experimental Evaluation

The paper evaluated the proposed methods on ten datasets spanning arithmetic, commonsense, and symbolic reasoning tasks. Significant improvements over Zero-shot-CoT were observed:

Arithmetic Reasoning: Across six datasets, Zero-shot PS+ prompting consistently outperformed Zero-shot-CoT by substantial margins, with accuracy improvements ranging from 2% to over 5%.
Commonsense Reasoning: On the CommonsenseQA and StrategyQA datasets, Zero-shot PS+ prompting showed performance gains over Zero-shot-CoT, evidencing its broader applicability.
Symbolic Reasoning: Zero-shot PS+ prompting outperformed Zero-shot-CoT and performed comparably to few-shot CoT on the Last Letters and Coin Flip datasets.

Comparative Analysis

The paper also compared the proposed methods against other zero-shot and few-shot baselines:

Zero-shot CoT: While effective, Zero-shot-CoT was limited by its susceptibility to calculation and step-missing errors.
Zero-shot PoT: The Program-of-Thought (PoT) method, which generates Python code as rationales, was included in comparisons and found to be competitive but not universally superior to PS+ prompting.
Few-shot Manual and Auto-CoT: The few-shot approaches set high benchmarks; however, PS+ prompting yielded accuracy levels that were comparable to or exceeded these baselines on several datasets.

Implications and Future Work

The Zero-shot PS prompting strategies introduce a novel, effective way to address calculation and missing-step errors in LLM task execution. By guiding LLMs with more detailed instructions and planning stages, the methods harness the LLMs' inherent capabilities more thoroughly. The approach shows promise for extending zero-shot learning methodologies beyond reasoning tasks, potentially applying to other complex NLP challenges. Future research could explore refining these prompts further and addressing semantic misunderstanding errors, aiming to continue enhancing the robustness and accuracy of LLM reasoning capacities.

Conclusion

Plan-and-Solve prompting represents a significant step in improving zero-shot reasoning performance in LLMs. The demonstrated improvements across various datasets suggest substantial potential for this approach in both theoretical advancements and practical applications. With its simplicity and effectiveness, PS prompting could pave the way for more intricate and less error-prone zero-shot learning frameworks in NLP and beyond.

Related Papers

GitHub

GitHub - AGI-Edgerunners/Plan-and-Solve-Prompting: Code for our ACL 2023 Paper "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models". (571 stars)

Tweets

https://twitter.com/aiphilly_/status/1816114371883327921

https://twitter.com/0xbingllm/status/1794408315755401440

https://twitter.com/pratik_katte/status/1872218416419160129

https://twitter.com/web_se/status/1762793392168423724

https://twitter.com/DinhKha77751996/status/1760143770228560247

https://twitter.com/4thWaveStevie/status/1746535327328538979

YouTube

Show All Videos