Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents (2302.01560v3)

Published 3 Feb 2023 in cs.AI
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Abstract: We investigate the challenge of task planning for multi-task embodied agents in open-world environments. Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose "$\underline{D}$escribe, $\underline{E}$xplain, $\underline{P}$lan and $\underline{S}$elect" ($\textbf{DEPS}$), an interactive planning approach based on LLMs. DEPS facilitates better error correction on initial LLM-generated $\textit{plan}$ by integrating $\textit{description}$ of the plan execution process and providing self-$\textit{explanation}$ of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal $\textit{selector}$, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the $\texttt{ObtainDiamond}$ grand challenge with our approach. The code is released at https://github.com/CraftJarvis/MC-Planner.

Interactive Planning with LLMs for Open-World Multi-Task Agents

The paper "Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents" investigates the deployment of LLMs in addressing task planning challenges for multi-task agents operating in open-world environments. The authors introduce the ``\underline{D}escribe, \underline{E}xplain, \underline{P}lan and \underline{S}elect" (DEPS) framework, which harnesses LLMs for interactive planning to solve complex, long-horizon tasks in environments like Minecraft. The research identifies two primary challenges in such settings: the inherent complexity of task execution in open worlds necessitating precise, iterative planning; and the inefficiency stemming from fixed sequencing of sub-goals without considering the agent's current proximities or capabilities.

Methodology

DEPS innovatively extends typical LLM-based planning by incorporating mechanisms for iterative feedback and dynamic goal selection. The framework includes:

  1. Descriptor: Captures the current state and execution failures to inform subsequent planning.
  2. Explainer: Aids the LLM in understanding the failure points within a plan, providing reasoning and insights for plan adjustments.
  3. Planner: Reassesses the plan based on feedback, allowing it to adapt to changing conditions in real-time.
  4. Goal Selector: Ranks candidate sub-goals, selecting the most feasible according to current state based on horizon predictions – effectively optimizing the task sequence for efficiency.

Experimental Results

The DEPS framework is evaluated across a wide range of tasks in Minecraft, achieving significant improvement over existing LLM-based planners. Specifically, it demonstrates robust capability on more than 70 tasks, with success rates nearly doubling those of baseline methods. The use of goal selection is particularly notable in its ability to enhance task efficiency, further validated through ablation studies that isolate the impact of each module in the DEPS pipeline. Notably, DEPS is the first planning-based system to accomplish the challenging ObtainDiamond task, marking a significant feat in open-world AI domains.

Implications and Future Work

The implications of this research extend to both practical and theoretical realms. Practically, DEPS provides a template for using LLMs not just as static planners but as dynamic, interactive systems capable of adjusting strategies based on real-time feedback—this could drastically enhance the adaptability and robustness of AI agents in complex environments. Theoretically, the work challenges prevailing assumptions about the fixed nature of task plans in AI, suggesting new avenues for research that emphasize adaptability and state-awareness in autonomous agents.

Looking forward, future developments might involve integrating DEPS with more sophisticated controllers or learning-based policies that could autonomously improve not only planning but execution as well. Moreover, the exploration of DEPS-like frameworks in other modalities and environments could further underline the versatility and effectiveness of LLM-based interactive planning. Such strides could pave the way towards generalized AI capable of sophisticated multi-task reasoning in varied and unstructured domains.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zihao Wang (216 papers)
  2. Shaofei Cai (17 papers)
  3. Guanzhou Chen (9 papers)
  4. Anji Liu (35 papers)
  5. Xiaojian Ma (52 papers)
  6. Yitao Liang (53 papers)
Citations (266)
Youtube Logo Streamline Icon: https://streamlinehq.com