Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interactive Task Planning with Language Models (2310.10645v1)

Published 16 Oct 2023 in cs.RO, cs.AI, cs.CL, and cs.HC

Abstract: An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals or distinct tasks, even during execution. However, most traditional methods require predefined module design, which makes it hard to generalize to different goals. Recent LLM based approaches can allow for more open-ended planning but often require heavy prompt engineering or domain-specific pretrained models. To tackle this, we propose a simple framework that achieves interactive task planning with LLMs. Our system incorporates both high-level planning and low-level function execution via language. We verify the robustness of our system in generating novel high-level instructions for unseen objectives and its ease of adaptation to different tasks by merely substituting the task guidelines, without the need for additional complex prompt engineering. Furthermore, when the user sends a new request, our system is able to replan accordingly with precision based on the new request, task guidelines and previously executed steps. Please check more details on our https://wuphilipp.github.io/itp_site and https://youtu.be/TrKLuyv26_g.

Interactive Task Planning with LLMs

The paper "Interactive Task Planning with LLMs" presents a novel approach in the domain of robotics, leveraging LLMs for task planning and execution. The primary aim of the research is to develop a framework that facilitates long-horizon task planning while allowing real-time adaptation to new objectives or tasks, using language-based interactions. The proposed system, Interactive Task Planning (ITP), integrates both high-level planning and low-level execution, responding flexibly to user commands and feedback.

Summary of the Approach

The core contribution of the work lies in embedding LLMs into robotic systems to enable interactive task planning. Unlike traditional robotic planning frameworks that rely on predefined modules, the use of LLMs provides a flexible solution capable of handling diverse tasks without extensive prompt engineering or training on specific domains. The framework operates using two distinct components: a high-level planner and a low-level execution module. The high-level planner creates step-by-step plans based on user requests, while the low-level execution module implements these plans, interacting with the robot's skill set via a functional API.

A central aspect of ITP's approach is its reliance on GPT-4 as the LLM backbone, engaging the model in both planning and action phases. The high-level planner generates a sequence of tasks based on user input, task guidelines, and previously completed steps. Meanwhile, the system's execution module translates these steps into actionable robot commands. The paper emphasizes the ability to adjust these plans dynamically in response to new user feedback or requests, demonstrating the system's interactive nature.

Key Results

The paper details the implementation of the ITP system in a real-world scenario where a robot is tasked with making various types of drinks. The evaluation highlights ITP's capabilities to generalize from a limited set of predefined recipes to new, unforeseen tasks. The high-level plans generated by ITP show robustness and adaptability, achieving success rates when confronted with a variety of user prompts, ranging from simple (e.g., making milk) to more complicated requests (e.g., creating novel combinations of drinks).

An additional strength of the ITP framework is its capacity for replanning. The system efficiently adjusts ongoing tasks based on updated user input, recalibrating its high-level plan to incorporate newly specified objectives. This adaptive feature is key to enhancing the interactive nature of the robot, aligning its actions consistently with user desires.

Implications and Future Directions

The implications of using LLMs like GPT-4 in robot task planning are significant, opening pathways for more interactive and adaptive robotic systems. By lowering the barrier for deploying complex task planning using general language processing capabilities, ITP presents a promising direction for future research. The task guidelines permit minimal predefined instructions, significantly simplifying the interface for potential users.

Looking forward, several areas for advancement are noted. Improving the precision of low-level skills can enable robots to handle more complex tasks reliably. Likewise, integrating enhanced sensory inputs, such as 3D vision, could refine the robot's understanding of its environment, leading to more effective task execution.

In conclusion, the ITP framework stands as a testament to the potential of LLMs in enhancing robotic systems' interactivity and task handling capabilities. As a foundation for future iterations, this research invites the reevaluation of task planning in robotics, driven by LLMs that facilitate seamless adaptability and interaction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Boyi Li (39 papers)
  2. Philipp Wu (9 papers)
  3. Pieter Abbeel (372 papers)
  4. Jitendra Malik (210 papers)
Citations (23)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com