The paper "LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks" takes a critical stance on the role of LLMs in planning and reasoning tasks, dispelling polarized views that either overestimate or underestimate their capabilities. The authors argue that auto-regressive LLMs are not capable of independent planning or self-verification, contrary to some claims in the literature. However, they propose that LLMs can serve invaluable roles as approximate knowledge sources within a framework they describe as the LLM-Modulo Framework. This framework synergizes LLMs with external model-based verifiers to facilitate complex reasoning and planning tasks.
Key Arguments and Contributions
- Limitations of LLMs in Planning and Reasoning:
- LLMs are generally seen by the authors as having a pseudo System 1 cognitive capacity (referring to intuitive and fast thinking processes as per Kahneman's model). Due to their design, which generates subsequent tokens in constant time, LLMs inherently lack the ability to perform complex, principled reasoning required for planning tasks.
- Studies reviewed in the paper suggest that LLMs are not able to generate executable plans autonomously. For example, only approximately 12% of plans generated by GPT-4 in a planning task context were found to be executable without errors.
- Self-Critiquing and Verification:
- The belief that LLMs can iteratively self-critique to improve the accuracy of their plans is challenged. The paper indicates that this approach lacks effectiveness because LLMs are not inherently designed to verify solutions.
- Demonstrated through studies on NP-complete problems like Graph Coloring, LLMs showed an inability to effectively critique and improve upon their generated solutions.
- Misunderstood Capabilities:
- The paper examines reasons why LLMs have been misconstrued as planners, including the potential for misinterpreting planning knowledge extracted from LLMs as executable plans. Many claims arise from confounding abstract or high-level knowledge extraction with the formulation of executable, precise plans.
- The LLM-Modulo Framework:
- The authors introduce the LLM-Modulo Framework as a paradigm for integrating LLMs into planning tasks in conjunction with external verifiers and critics. Within this framework, LLMs are used for generating candidate plans and providing approximate domain models, while correctness is ensured by external critics and verifiers.
- The framework is inspired by SAT Modulo Theories and is positioned to maintain soundness through a bi-directional interaction between LLMs and verifiers, leading to more flexible and expressive planning and reasoning capabilities.
- Roles of LLMs in the Framework:
- LLMs can serve various roles, including generating initial plan candidates, translating plans into formats necessary for critique, and refining incomplete problem specifications in collaboration with human users and domain experts.
- The framework exploits LLMs' abilities in syntax transformation and hypothesis generation without erroneously attributing them verification or sophisticated deductive reasoning capacities.
- Applications and Broader Implications:
- The paper contextualizes the proposed framework within real-world scenarios such as NASA mission planning, emphasizing its potential to handle complex, real-world planning tasks more effectively than traditional symbolic planners.
- The framework also aligns with broader neuro-symbolic approaches where symbolic reasoning is paired with neural processing power, offering improvements in fields like Reinforcement Learning.
Conclusion
The authors call for a balanced perspective on the capabilities of LLMs, urging for a nuanced understanding that neither overestimates their standalone reasoning abilities nor relegates them to mere translators. By outlining the LLM-Modulo Framework, they propose a robust hybrid system that combines the strengths of LLMs and external model-based critics to enhance planning and reasoning task performance, providing new dimensions for applications in AI planning systems.