Least-to-Most Prompting Enables Complex Reasoning in LLMs
In "Least-to-Most Prompting Enables Complex Reasoning in LLMs," Zhou et al. delve into addressing the limitations of standard chain-of-thought (CoT) prompting in LLMs. While CoT prompting has shown commendable performance improvements over conventional few-shot prompting, it struggles with tasks that require generalization to more complex problems than those seen in training examples. The authors propose least-to-most (L2M) prompting as a novel strategy to overcome this easy-to-hard generalization issue by segmenting complex problems into a sequence of simpler subproblems and solving them incrementally.
Methodology
Least-to-most prompting operates in a two-stage process: decomposition followed by sequential problem solving. The decomposition stage involves breaking down a complex problem into simpler subproblems. In the sequential problem-solving stage, these subproblems are tackled one at a time, where each subproblem solution informs the subsequent one. This strategy is implemented via few-shot prompting without any need for additional training or fine-tuning.
Results
The authors present experimental validation of the L2M approach across three domains: symbolic manipulation, compositional generalization, and mathematical reasoning tasks. Each domain illustrates the significant advantages of L2M over standard CoT prompting.
Symbolic Manipulation
The last-letter-concatenation task serves as a benchmark for symbolic manipulation. CoT prompting performs well only if the length of testing lists does not exceed those in the prompt exemplars; it falters otherwise. The L2M method shows substantial improvements in length generalization, particularly for longer lists where CoT's performance degrades.
Compositional Generalization
The SCAN dataset aids in assessing compositional generalization capabilities. SCAN tasks require translating natural language commands into action sequences, with stringent requirements for length generalization. While CoT prompts yield limited success, L2M achieves near-perfect accuracy (99.7%) using only 14 demonstration examples within the GPT-3 code-davinci-002 model. This performance holds across any task split, including length splits, significantly outperforming specialized neural-symbolic models which require extensive dataset-specific training.
Mathematical Reasoning
In mathematical reasoning tasks, specifically those found within the GSM8K and DROP datasets, L2M shows enhancement over CoT prompting, especially in solving multi-step problems. For instance, in tasks necessitating five or more steps, accuracy with L2M markedly surpasses CoT. For DROP—a dataset predominantly consisting of simple decomposable problems—L2M outperforms CoT by a notable margin, reflecting its broader applicability and robustness in numerical reasoning.
Implications and Future Directions
The implications of this novel prompting method are multifaceted:
- Practical: The L2M technique can dramatically boost the performance of state-of-the-art LLMs on tasks requiring extended reasoning capabilities—facilitating more accurate and deeper natural language understanding.
- Theoretical: The research offers insights into improving generalization in machine learning models, especially for tasks where training examples span a wide range of complexity.
- Future Work: The research opens avenues for further exploration into more advanced decomposition strategies and their applications across diverse problem domains. It also points to the potential development of hybrid models that can seamlessly integrate L2M with other advanced prompting techniques.
Conclusion
Least-to-most prompting represents a significant advancement in prompting techniques for LLMs, demonstrating enhanced capabilities in handling complex reasoning tasks. By adopting a hierarchical problem-solving approach, it overcomes the inherent limitations of CoT prompting, showing superior performance across symbolic manipulation, compositional generalization, and math reasoning tasks. This dual-phase strategy of problem decomposition and sequential resolution establishes a new paradigm in the quest for AI systems capable of sophisticated, human-like reasoning.