Emergent Mind

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

(2402.01817)
Published Feb 2, 2024 in cs.AI and cs.LG

Abstract

There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {\bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.

Overview

  • The paper introduces the LLM-Modulo Framework, highlighting LLMs’ inability to generate or verify executable plans while advocating their helpful role in planning when combined with model-based verifiers.

  • It dissects misconceptions around LLMs’ planning and self-verification abilities, emphasizing the distinction between extracting general planning knowledge and generating executable plans.

  • The LLM-Modulo Framework incorporates hard and soft critics to evaluate LLM-generated plan candidates, utilizing LLMs for candidate plan generation, plan reformulation, specification refinement, and model acquisition.

  • Proposes that the integration of LLMs in planning tasks presents a pragmatic approach to overcoming limitations faced by traditional planning systems, advocating for a collaborative, neuro-symbolic architecture.

Introduction to LLM-Modulo Framework

Recent advancements in LLMs have spurred debates regarding their capacities and roles in planning and reasoning tasks. Amidst claims oscillating between the over-optimistic and the over-pessimistic, this analysis proposes a middle ground, introducing the LLM-Modulo Framework. This framework acknowledges LLMs' inability to autonomously generate executable plans or perform plan verification but emphasizes their potential as valuable contributors when combined with external model-based verifiers. It essentially positions LLMs as powerful cognitive orthotics and universal approximate knowledge sources that, when correctly harnessed, can significantly aid in the planning process.

Core Findings

LLMs' Limitations in Planning and Self-Verification

  • Studies reveal that LLMs struggle to autonomously generate executable plans, with a trivial percentage of LLM-generated plans actually reaching their goals upon autonomous mode operation.

  • LLMs also demonstrate a notable deficiency in plan verification, challenging the notion that they can iteratively refine solutions through self-critique.

Providing Clarity on Contradictory Claims

  • Claims of Planning Capabilities: Misinterpretations occur when general planning knowledge extraction is mistaken for executable plan generation. Moreover, LLMs' impressive performance in certain domains is often due to the absence of complex subgoal interactions or relies on human intervention.

  • Self-Verification Abilities: The optimism around LLMs self-verifying their output lacks empirical support; the efficacy of iterative prompting and LLM-based critiquing in refining solutions remains unsubstantiated.

The LLM-Modulo Framework's Structure and function

Critics in the Framework

  • The framework integrates hard (model-based verifiers ensuring factual correctness) and soft critics (focusing on stylistic and explicability aspects), providing a comprehensive evaluation of candidate plans generated by LLMs.

Multi-faceted Roles of LLMs

  • Candidate Plan Generation: Generating plausible plan candidates based on problem specifications and iterative critiques.

  • Reformulation: Utilizing their syntax conversion strengths to adapt plan candidates into forms digestible by various critics.

  • Specification Refinement and Model Acquisition: Assisting in refining problem specifications and acquiring domain models through interaction with human experts.

Synthetic Data for Fine-tuning

  • The correct plans vetted by the external critics can serve as a reliable corpus for LLM fine-tuning, enhancing the model's future performance on similar tasks.

Implications and Future Directions

The LLM-Modulo Framework proposes a pragmatic approach towards leveraging LLMs in planning tasks, shifting the narrative from their solitary functionality to a collaborative, integrative use with symbolic components. It opens avenues for:

  • Expanding the scope of model-based planning: Integrating LLMs in planning routines can address the expressiveness and search-complexity limitations traditional planners face, offering a path towards more flexible and broadly applicable planning solutions.

  • Enhanced domain model acquisition: Facilitating easier extraction and refinement of planning models from unstructured knowledge sources.

Conclusion

The exploration into the LLM-Modulo Framework underscores a paradigm shift in utilizing LLMs for planning tasks. Instead of viewing these models as stand-alone planners or verifiers, the framework advocates for their integration within a neuro-symbolic architecture that pairs their generative capabilities with the precision of external verifiers. This balanced approach not only enhances the functionality and applicability of LLMs in complex planning scenarios but also sets a precedent for future research in the field of artificial intelligence and automated planning systems.

Get summaries of trending AI/ML papers delivered straight to your inbox

Unsubscribe anytime.