LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model

Published 29 Nov 2023 in cs.RO and cs.AI | (2311.17406v2)

Abstract: This work addresses the problem of long-horizon task planning with the LLM in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose an open state representation that provides continuous expansion and updating of object attributes from the LLM's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. This allows continuously updating world model to enhance context understanding for decision-making in task planning. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning. (Video\footnote{Video demonstration: \url{https://youtu.be/QkN-8pxV3Mo}.})

Abstract PDF HTML Upgrade to Chat

References (37)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces LLM-State, a novel dual-faceted state representation model that continuously updates object states for enhanced task planning.
It integrates structured object entries with unstructured historical summaries to effectively monitor and adapt to dynamic environments.
Experiments demonstrate that LLM-State significantly outperforms baselines in complex household tasks, improving robotic decision-making.

Introduction

In robotics, a common challenge is to plan and execute tasks in dynamic and unpredictable environments, such as a typical household. This task planning process becomes even more complex when robots must operate over long time horizons with the need to interact with various objects, each with changing attributes. LLMs have been leveraged to help robots navigate these tasks by providing common-sense reasoning and high-level decision-making support. However, traditional use of LLMs in task planning often confronts difficulties with keeping track of the evolving state of objects and fails to relate key attributes of the tasks effectively.

Methodology

To address these challenges, a novel state representation model, termed "LLM-State," has been proposed. This model utilizes LLMs for continuous monitoring and updating of an environment's state — particularly for objects and their attributes — through an open-world household setting. LLM-State distinguishes itself by combining structured object entries with an unstructured summary of historical data. Structured object entries in the LLM-State provide detailed insights into every object relevant to the task, enabling the LLM to dynamically adapt to changes in the environment by updating these entries. Meanwhile, the unstructured entry provides contextual information and historical summaries, aiding the LLM in understanding the evolution of the environment over time. This innovative dual-faceted representation is key for enhancing the robot's decision-making ability in complex, long-horizon planning tasks.

Evaluations and Results

The performance of this approach was thoroughly evaluated in simulated and real-world settings. The experiments spanned a range of household tasks with varying degrees of complexity, demonstrating that the LLM-State significantly outperforms baseline models, especially in long-horizon tasks. The analysis further showed that both structured object entries and the unstructured historical summary play critical roles in the success of task planning in dynamic environments. It is explicit tracking of object states, including previously undefined attributes, that enables robust decision-making and the successful execution of multi-step tasks.

Conclusion

The LLM-State model represents a significant advance in open-world task planning, where adaptability and real-time contextual understanding are paramount. By automating state representation and updates, and providing a comprehensive record of actions and changes, LLM-State effectively empowers robots to plan and execute tasks over longer horizons and within constantly changing environments. Future work could include enhancing the model to account for relationships between objects, further improving the task-planning capabilities of robotic systems using LLMs.

Markdown