Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models (2212.04088v3)

Published 8 Dec 2022 in cs.AI, cs.CL, cs.CV, cs.LG, and cs.RO
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

Abstract: This study focuses on using LLMs as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a novel method, LLM-Planner, that harnesses the power of LLMs to do few-shot planning for embodied agents. We further propose a simple but effective way to enhance LLMs with physical grounding to generate and update plans that are grounded in the current environment. Experiments on the ALFRED dataset show that our method can achieve very competitive few-shot performance: Despite using less than 0.5% of paired training data, LLM-Planner achieves competitive performance with recent baselines that are trained using the full training data. Existing methods can barely complete any task successfully under the same few-shot setting. Our work opens the door for developing versatile and sample-efficient embodied agents that can quickly learn many tasks. Website: https://dki-lab.github.io/LLM-Planner

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with LLMs

The paper "LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with LLMs" introduces a novel approach to enhance the capabilities of embodied agents by leveraging LLMs such as GPT-3. The core motivation of this research is to address the limitations of existing methods in terms of data efficiency and adaptability in complex, partially-observable environments. The authors propose the LLM-Planner, which performs few-shot grounded planning for embodied agents, thereby aiming to reduce the dependence on large datasets typically required to train such agents.

Methodology

The paper presents the LLM-Planner as a generative model distinct from traditional methods that often require exhaustive lists of admissible actions and rely on ranking mechanisms. By directly generating high-level plans, LLM-Planner circumvents the exhaustive enumeration of skills, enhancing practicality in environments with large and diverse object types. The approach consists of several key components:

  1. Few-Shot High-Level Planning: LLM-Planner employs in-context learning to use LLMs for generating high-level plans with minimal data. It leverages dynamic in-context example retrieval via a k-nearest-neighbor (kNN) mechanism to enhance the relevance of the chosen examples based on their similarity to the task at hand.
  2. Grounding and Dynamic Re-Planning: The model introduces a novel grounded re-planning mechanism that allows the agent to dynamically alter its plan based on new environmental perceptions. When the agent encounters obstacles or inefficiencies, it can revise its plan by incorporating the objects it has observed, thereby grounding its actions in its current context.
  3. Integration with Existing Models: The LLM-Planner is designed to be integrated with existing hierarchical models like HLSM, utilizing pre-trained low-level planners and perception modules to convert high-level instructions into executable actions.

Experimental Evaluation

The efficacy of LLM-Planner is validated on the ALFRED benchmark, which features diverse and complex tasks within a simulated household environment. The results demonstrate that LLM-Planner achieves competitive performance compared to the full data-trained baselines, even though it uses less than 0.5% of the training data. Notably, the grounded re-planning mechanism significantly improves task success rates, illustrating the impact of dynamic adaptation based on environmental feedback.

Implications and Future Directions

The implications of this paper are twofold. Practically, LLM-Planner offers a more data-efficient solution for training embodied agents, reducing the dependency on large datasets while maintaining comparable performance. Theoretically, it opens up new avenues for exploring how LLMs can be leveraged in embodied tasks, particularly in enhancing adaptability and grounding in dynamic environments.

Future research could delve into optimizing prompt designs specifically for planning tasks, exploring other types of LLMs such as those fine-tuned on code (e.g., Codex), and potentially incorporating more sophisticated methods for grounding that could account for deeper nuances in environment-object interactions. Furthermore, refining the object detection components and low-level controllers could ameliorate some of the observed performance bottlenecks, resulting in more robust and reliable agents.

In summary, the "LLM-Planner" paper showcases a promising method for improving the efficiency and adaptability of embodied AI through the use of LLMs for few-shot grounded planning. This work points towards a future where embodied agents can operate more effectively in real-world scenarios, learning from minimal examples and dynamically adjusting to their ever-changing environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chan Hee Song (10 papers)
  2. Jiaman Wu (11 papers)
  3. Clayton Washington (1 paper)
  4. Brian M. Sadler (63 papers)
  5. Wei-Lun Chao (92 papers)
  6. Yu Su (138 papers)
Citations (292)