Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models (2305.04091v3)

Published 6 May 2023 in cs.CL
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Abstract: LLMs have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.

Plan-and-Solve Prompting: Enhancing Zero-Shot Chain-of-Thought Reasoning in LLMs

Introduction

The efficacy of LLMs in NLP tasks has been well-documented, with models such as GPT-3 demonstrating significant prowess in a range of applications. Chain-of-Thought (CoT) prompting has further enhanced performance in multi-step reasoning tasks by enabling LLMs to generate intermediate reasoning steps, typically requiring manually crafted few-shot examples. Zero-Shot Chain-of-Thought (Zero-shot-CoT) is a simplification where the prompt "Let's think step by step" is appended to the target problem, leading to surprisingly robust performance levels even without manually curated examples. Despite these advances, Zero-shot-CoT experiences pitfalls, including calculation errors, missing-step errors, and semantic misunderstandings.

Plan-and-Solve Prompting Approach

To address the weaknesses of Zero-shot-CoT, Wang et al. propose the Plan-and-Solve (PS) Prompting framework. PS prompting consists of two components:

  1. Plan Phase: The model first devises a plan to break down the entire task into smaller subtasks.
  2. Solve Phase: The model then executes the subtasks according to the devised plan.

The framework extends to PS+ prompting, which includes more detailed instructions aimed at addressing calculation errors and improving the quality of generated reasoning steps. The effectiveness of these approaches was evaluated across various datasets and reasoning problems using GPT-3.

Experimental Evaluation

The paper evaluated the proposed methods on ten datasets spanning arithmetic, commonsense, and symbolic reasoning tasks. Significant improvements over Zero-shot-CoT were observed:

  • Arithmetic Reasoning: Across six datasets, Zero-shot PS+ prompting consistently outperformed Zero-shot-CoT by substantial margins, with accuracy improvements ranging from 2% to over 5%.
  • Commonsense Reasoning: On the CommonsenseQA and StrategyQA datasets, Zero-shot PS+ prompting showed performance gains over Zero-shot-CoT, evidencing its broader applicability.
  • Symbolic Reasoning: Zero-shot PS+ prompting outperformed Zero-shot-CoT and performed comparably to few-shot CoT on the Last Letters and Coin Flip datasets.

Comparative Analysis

The paper also compared the proposed methods against other zero-shot and few-shot baselines:

  • Zero-shot CoT: While effective, Zero-shot-CoT was limited by its susceptibility to calculation and step-missing errors.
  • Zero-shot PoT: The Program-of-Thought (PoT) method, which generates Python code as rationales, was included in comparisons and found to be competitive but not universally superior to PS+ prompting.
  • Few-shot Manual and Auto-CoT: The few-shot approaches set high benchmarks; however, PS+ prompting yielded accuracy levels that were comparable to or exceeded these baselines on several datasets.

Implications and Future Work

The Zero-shot PS prompting strategies introduce a novel, effective way to address calculation and missing-step errors in LLM task execution. By guiding LLMs with more detailed instructions and planning stages, the methods harness the LLMs' inherent capabilities more thoroughly. The approach shows promise for extending zero-shot learning methodologies beyond reasoning tasks, potentially applying to other complex NLP challenges. Future research could explore refining these prompts further and addressing semantic misunderstanding errors, aiming to continue enhancing the robustness and accuracy of LLM reasoning capacities.

Conclusion

Plan-and-Solve prompting represents a significant step in improving zero-shot reasoning performance in LLMs. The demonstrated improvements across various datasets suggest substantial potential for this approach in both theoretical advancements and practical applications. With its simplicity and effectiveness, PS prompting could pave the way for more intricate and less error-prone zero-shot learning frameworks in NLP and beyond.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Lei Wang (975 papers)
  2. Wanyu Xu (4 papers)
  3. Yihuai Lan (8 papers)
  4. Zhiqiang Hu (48 papers)
  5. Yunshi Lan (30 papers)
  6. Roy Ka-Wei Lee (68 papers)
  7. Ee-Peng Lim (57 papers)
Citations (230)
Youtube Logo Streamline Icon: https://streamlinehq.com