Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks (2402.01817v3)

Published 2 Feb 2024 in cs.AI and cs.LG
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

Abstract: There is considerable confusion about the role of LLMs in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {\bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.

The paper "LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks" takes a critical stance on the role of LLMs in planning and reasoning tasks, dispelling polarized views that either overestimate or underestimate their capabilities. The authors argue that auto-regressive LLMs are not capable of independent planning or self-verification, contrary to some claims in the literature. However, they propose that LLMs can serve invaluable roles as approximate knowledge sources within a framework they describe as the LLM-Modulo Framework. This framework synergizes LLMs with external model-based verifiers to facilitate complex reasoning and planning tasks.

Key Arguments and Contributions

  1. Limitations of LLMs in Planning and Reasoning:
    • LLMs are generally seen by the authors as having a pseudo System 1 cognitive capacity (referring to intuitive and fast thinking processes as per Kahneman's model). Due to their design, which generates subsequent tokens in constant time, LLMs inherently lack the ability to perform complex, principled reasoning required for planning tasks.
    • Studies reviewed in the paper suggest that LLMs are not able to generate executable plans autonomously. For example, only approximately 12% of plans generated by GPT-4 in a planning task context were found to be executable without errors.
  2. Self-Critiquing and Verification:
    • The belief that LLMs can iteratively self-critique to improve the accuracy of their plans is challenged. The paper indicates that this approach lacks effectiveness because LLMs are not inherently designed to verify solutions.
    • Demonstrated through studies on NP-complete problems like Graph Coloring, LLMs showed an inability to effectively critique and improve upon their generated solutions.
  3. Misunderstood Capabilities:
    • The paper examines reasons why LLMs have been misconstrued as planners, including the potential for misinterpreting planning knowledge extracted from LLMs as executable plans. Many claims arise from confounding abstract or high-level knowledge extraction with the formulation of executable, precise plans.
  4. The LLM-Modulo Framework:
    • The authors introduce the LLM-Modulo Framework as a paradigm for integrating LLMs into planning tasks in conjunction with external verifiers and critics. Within this framework, LLMs are used for generating candidate plans and providing approximate domain models, while correctness is ensured by external critics and verifiers.
    • The framework is inspired by SAT Modulo Theories and is positioned to maintain soundness through a bi-directional interaction between LLMs and verifiers, leading to more flexible and expressive planning and reasoning capabilities.
  5. Roles of LLMs in the Framework:
    • LLMs can serve various roles, including generating initial plan candidates, translating plans into formats necessary for critique, and refining incomplete problem specifications in collaboration with human users and domain experts.
    • The framework exploits LLMs' abilities in syntax transformation and hypothesis generation without erroneously attributing them verification or sophisticated deductive reasoning capacities.
  6. Applications and Broader Implications:
    • The paper contextualizes the proposed framework within real-world scenarios such as NASA mission planning, emphasizing its potential to handle complex, real-world planning tasks more effectively than traditional symbolic planners.
    • The framework also aligns with broader neuro-symbolic approaches where symbolic reasoning is paired with neural processing power, offering improvements in fields like Reinforcement Learning.

Conclusion

The authors call for a balanced perspective on the capabilities of LLMs, urging for a nuanced understanding that neither overestimates their standalone reasoning abilities nor relegates them to mere translators. By outlining the LLM-Modulo Framework, they propose a robust hybrid system that combines the strengths of LLMs and external model-based critics to enhance planning and reasoning task performance, providing new dimensions for applications in AI planning systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Subbarao Kambhampati (126 papers)
  2. Karthik Valmeekam (17 papers)
  3. Lin Guan (25 papers)
  4. Kaya Stechly (9 papers)
  5. Mudit Verma (25 papers)
  6. Siddhant Bhambri (16 papers)
  7. Lucas Saldyt (4 papers)
  8. Anil Murthy (2 papers)
Citations (83)
Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com