Plan-and-Solve Prompting: Enhancing Zero-Shot Chain-of-Thought Reasoning in LLMs
Introduction
The efficacy of LLMs in NLP tasks has been well-documented, with models such as GPT-3 demonstrating significant prowess in a range of applications. Chain-of-Thought (CoT) prompting has further enhanced performance in multi-step reasoning tasks by enabling LLMs to generate intermediate reasoning steps, typically requiring manually crafted few-shot examples. Zero-Shot Chain-of-Thought (Zero-shot-CoT) is a simplification where the prompt "Let's think step by step" is appended to the target problem, leading to surprisingly robust performance levels even without manually curated examples. Despite these advances, Zero-shot-CoT experiences pitfalls, including calculation errors, missing-step errors, and semantic misunderstandings.
Plan-and-Solve Prompting Approach
To address the weaknesses of Zero-shot-CoT, Wang et al. propose the Plan-and-Solve (PS) Prompting framework. PS prompting consists of two components:
- Plan Phase: The model first devises a plan to break down the entire task into smaller subtasks.
- Solve Phase: The model then executes the subtasks according to the devised plan.
The framework extends to PS+ prompting, which includes more detailed instructions aimed at addressing calculation errors and improving the quality of generated reasoning steps. The effectiveness of these approaches was evaluated across various datasets and reasoning problems using GPT-3.
Experimental Evaluation
The paper evaluated the proposed methods on ten datasets spanning arithmetic, commonsense, and symbolic reasoning tasks. Significant improvements over Zero-shot-CoT were observed:
- Arithmetic Reasoning: Across six datasets, Zero-shot PS+ prompting consistently outperformed Zero-shot-CoT by substantial margins, with accuracy improvements ranging from 2% to over 5%.
- Commonsense Reasoning: On the CommonsenseQA and StrategyQA datasets, Zero-shot PS+ prompting showed performance gains over Zero-shot-CoT, evidencing its broader applicability.
- Symbolic Reasoning: Zero-shot PS+ prompting outperformed Zero-shot-CoT and performed comparably to few-shot CoT on the Last Letters and Coin Flip datasets.
Comparative Analysis
The paper also compared the proposed methods against other zero-shot and few-shot baselines:
- Zero-shot CoT: While effective, Zero-shot-CoT was limited by its susceptibility to calculation and step-missing errors.
- Zero-shot PoT: The Program-of-Thought (PoT) method, which generates Python code as rationales, was included in comparisons and found to be competitive but not universally superior to PS+ prompting.
- Few-shot Manual and Auto-CoT: The few-shot approaches set high benchmarks; however, PS+ prompting yielded accuracy levels that were comparable to or exceeded these baselines on several datasets.
Implications and Future Work
The Zero-shot PS prompting strategies introduce a novel, effective way to address calculation and missing-step errors in LLM task execution. By guiding LLMs with more detailed instructions and planning stages, the methods harness the LLMs' inherent capabilities more thoroughly. The approach shows promise for extending zero-shot learning methodologies beyond reasoning tasks, potentially applying to other complex NLP challenges. Future research could explore refining these prompts further and addressing semantic misunderstanding errors, aiming to continue enhancing the robustness and accuracy of LLM reasoning capacities.
Conclusion
Plan-and-Solve prompting represents a significant step in improving zero-shot reasoning performance in LLMs. The demonstrated improvements across various datasets suggest substantial potential for this approach in both theoretical advancements and practical applications. With its simplicity and effectiveness, PS prompting could pave the way for more intricate and less error-prone zero-shot learning frameworks in NLP and beyond.