Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning (2305.14909v2)

Published 24 May 2023 in cs.AI

Abstract: There is a growing interest in applying pre-trained LLMs to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. To address the fact that LLMs may not generate a fully functional PDDL model initially, we employ LLMs as an interface between PDDL and sources of corrective feedback, such as PDDL validators and humans. For users who lack a background in PDDL, we show that LLMs can translate PDDL into natural language and effectively encode corrective feedback back to the underlying domain model. Our framework not only enjoys the correctness guarantee offered by the external planners but also reduces human involvement by allowing users to correct domain models at the beginning, rather than inspecting and correcting (through interactive prompting) every generated plan as in previous work. On two IPC domains and a Household domain that is more complicated than commonly used benchmarks such as ALFWorld, we demonstrate that GPT-4 can be leveraged to produce high-quality PDDL models for over 40 actions, and the corrected PDDL models are then used to successfully solve 48 challenging planning tasks. Resources, including the source code, are released at: https://guansuns.github.io/pages/LLM-dm.

PDF Abstract

Leveraging Pre-trained LLMs for Model-based Task Planning with PDDL

The integration of LLMs in AI opens new frontiers in task planning by leveraging their extensive pre-trained knowledge. In their paper, Lin Guan, Karthik Valmeekam, Sarath Sreedharan, and Subbarao Kambhampati introduce a novel framework wherein LLMs, such as GPT-4, are employed to generate explicit world models codified in Planning Domain Definition Language (PDDL), enabling efficient task planning with domain-independent classical planners. This approach capitalizes on the cognitive strengths of LLMs for extracting symbolic representations, addressing limitations linked to correctness and executability of plans generated directly by LLMs.

Framework Overview

The authors propose a two-step framework: first, using LLMs to construct PDDL models from task descriptions, and second, utilizing these models within classical planning paradigms to devise feasible plans. This method contrasts with paradigms relying solely on LLMs for planning, which often result in suboptimal or incorrect plans, typically sensitive to overlooked physical constraints or long-term dependencies.

Central to this methodology is the extraction of symbolic action models in PDDL. The LLMs are leveraged to generate PDDL action schemas from descriptive inputs, iteratively refining these models through natural language feedback from both domain validators and human users. This approach minimizes human involvement in re-planning and optimizes the efficiency of correcting the domain models by focusing on initial model validation rather than continuous feedback during plan generation.

Empirical Evaluation

The efficacy of this framework was empirically validated across diverse domains, including two IPC benchmark domains and a complex household domain, which features constraints akin to real-world robotic applications. Notably, the result was the successful construction of high-quality PDDL models for over 40 actions using GPT-4, surpassing the noisy and error-prone outputs from GPT-3.5.

Quantitative metrics underscore the robustness of the approach: for the household domain alone, GPT-4 produced models with significantly fewer errors compared to GPT-3.5, facilitating the reliable execution of 48 planning tasks with minimal manual correction. Overcoming errors predominantly involved incorporating human-readable feedback once syntax validation was conducted through automated tools like the VAL system.

Implications and Future Directions

Practically, embedding world models derived from LLMs into planning systems leverages the knowledge embedded in these models to automate planning processes, significantly reducing the dependency on domain experts throughout the planning lifecycle. This approach points towards hybrid systems where LLM-inspired symbolic models interface seamlessly with domain-specific planners, promising improved reliability and efficiency.

Theoretically, the paper suggests that while LLMs are proficient at generating domain models, the complexity of planning tasks requires continued reliance on classical planners for solution generation. Thereby, LLMs should be positioned as facilitators for domain model extraction and validation rather than sole planners.

Future advances might explore scaling this framework to more complex and partially observable environments, extending its capabilities to deal with incomplete information synonymous with real-world scenarios. Additionally, enhancing LLMs to autonomously resolve logical inconsistencies within generated models, and to incorporate feedback more efficiently, remains a pivotal research challenge.

Overall, the research demonstrates a promising direction for augmenting artificial intelligence with LLMs, providing new insights and tools for AI planning while charting a course for more transparent, reliable, and efficient autonomous systems.