Papers
Topics
Authors
Recent
Search
2000 character limit reached

LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model

Published 29 Nov 2023 in cs.RO and cs.AI | (2311.17406v2)

Abstract: This work addresses the problem of long-horizon task planning with the LLM in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose an open state representation that provides continuous expansion and updating of object attributes from the LLM's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. This allows continuously updating world model to enhance context understanding for decision-making in task planning. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning. (Video\footnote{Video demonstration: \url{https://youtu.be/QkN-8pxV3Mo}.})

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, “Chatgpt for robotics: Design principles and model abilities,” Microsoft Auton. Syst. Robot. Res, vol. 2, p. 20, 2023.
  2. OpenAI, “Gpt-4 technical report,” 2023.
  3. W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in International Conference on Machine Learning.   PMLR, 2022, pp. 9118–9147.
  4. W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar et al., “Inner monologue: Embodied reasoning through planning with language models,” in Conference on Robot Learning.   PMLR, 2023, pp. 1769–1782.
  5. I. Singh, V. Blukis, A. Mousavian, A. Goyal, D. Xu, J. Tremblay, D. Fox, J. Thomason, and A. Garg, “Progprompt: Generating situated robot task plans using large language models,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 11 523–11 530.
  6. C. H. Song, J. Wu, C. Washington, B. M. Sadler, W.-L. Chao, and Y. Su, “Llm-planner: Few-shot grounded planning for embodied agents with large language models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023.
  7. K. Rana, J. Haviland, S. Garg, J. Abou-Chakra, I. Reid, and N. Suenderhauf, “Sayplan: Grounding large language models using 3d scene graphs for scalable task planning,” arXiv preprint arXiv:2307.06135, 2023.
  8. A. Brohan, Y. Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, R. Julian et al., “Do as i can, not as i say: Grounding language in robotic affordances,” in Conference on Robot Learning.   PMLR, 2023, pp. 287–318.
  9. K. Valmeekam, A. Olmo, S. Sreedharan, and S. Kambhampati, “Large language models still can’t plan (a benchmark for llms on planning and reasoning about change),” in NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
  10. T. Silver, V. Hariprasad, R. S. Shuttleworth, N. Kumar, T. Lozano-Pérez, and L. P. Kaelbling, “Pddl planning with pretrained large language models,” in NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
  11. D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu et al., “Palm-e: An embodied multimodal language model,” arXiv preprint arXiv:2303.03378, 2023.
  12. Z. Liu, A. Bahety, and S. Song, “Reflect: Summarizing robot experiences for failure explanation and correction,” arXiv preprint arXiv:2306.15724, 2023.
  13. J. Wu, R. Antonova, A. Kan, M. Lepert, A. Zeng, S. Song, J. Bohg, S. Rusinkiewicz, and T. Funkhouser, “Tidybot: Personalized robot assistance with large language models,” arXiv preprint arXiv:2305.05658, 2023.
  14. J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, “Code as policies: Language model programs for embodied control,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 9493–9500.
  15. T. Silver, S. Dan, K. Srinivas, J. B. Tenenbaum, L. P. Kaelbling, and M. Katz, “Generalized planning in pddl domains with pretrained large language models,” arXiv preprint arXiv:2305.11014, 2023.
  16. Y. Xie, C. Yu, T. Zhu, J. Bai, Z. Gong, and H. Soh, “Translating natural language to planning goals with large-language models,” arXiv preprint arXiv:2302.05128, 2023.
  17. M. Skreta, N. Yoshikawa, S. Arellano-Rubach, Z. Ji, L. B. Kristensen, K. Darvish, A. Aspuru-Guzik, F. Shkurti, and A. Garg, “Errors are useful prompts: Instruction guided task programming with verifier-assisted iterative prompting,” arXiv preprint arXiv:2303.14100, 2023.
  18. B. Liu, Y. Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, and P. Stone, “Llm+ p: Empowering large language models with optimal planning proficiency,” arXiv preprint arXiv:2304.11477, 2023.
  19. B. Zhang and H. Soh, “Large language models as zero-shot human models for human-robot interaction,” arXiv preprint arXiv:2303.03548, 2023.
  20. Y. Ding, X. Zhang, C. Paxton, and S. Zhang, “Task and motion planning with large language models for object rearrangement,” arXiv preprint arXiv:2303.06247, 2023.
  21. Y. Ding, X. Zhang, S. Amiri, N. Cao, H. Yang, A. Kaminski, C. Esselink, and S. Zhang, “Integrating action knowledge and llms for task planning and situation handling in open worlds,” arXiv preprint arXiv:2305.17590, 2023.
  22. Z. Zhao, W. S. Lee, and D. Hsu, “Large language models as commonsense knowledge for large-scale task planning,” arXiv preprint arXiv:2305.14078, 2023.
  23. Z. Wang, S. Cai, A. Liu, X. Ma, and Y. Liang, “Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents,” arXiv preprint arXiv:2302.01560, 2023.
  24. N. Shinn, B. Labash, and A. Gopinath, “Reflexion: an autonomous agent with dynamic memory and self-reflection,” arXiv preprint arXiv:2303.11366, 2023.
  25. A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” arXiv preprint arXiv:2307.15818, 2023.
  26. J. Lin, Y. Du, O. Watkins, D. Hafner, P. Abbeel, D. Klein, and A. Dragan, “Learning to model the world with language,” arXiv preprint arXiv:2308.01399, 2023.
  27. R. E. Fikes and N. J. Nilsson, “Strips: A new approach to the application of theorem proving to problem solving,” Artificial intelligence, vol. 2, no. 3-4, pp. 189–208, 1971.
  28. M. Fox and D. Long, “Pddl2. 1: An extension to pddl for expressing temporal planning domains,” Journal of artificial intelligence research, vol. 20, pp. 61–124, 2003.
  29. G. Brewka, T. Eiter, and M. Truszczyński, “Answer set programming at a glance,” Communications of the ACM, vol. 54, no. 12, pp. 92–103, 2011.
  30. K. Nottingham, Y. Razeghi, K. Kim, J. Lanier, P. Baldi, R. Fox, and S. Singh, “Selective perception: Optimizing state descriptions with reinforcement learning for language model actors,” arXiv preprint arXiv:2307.11922, 2023.
  31. Y. Zhang, X. Huang, J. Ma, Z. Li, Z. Luo, Y. Xie, Y. Qin, T. Luo, Y. Li, S. Liu et al., “Recognize anything: A strong image tagging model,” arXiv preprint arXiv:2306.03514, 2023.
  32. S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
  33. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv:2304.02643, 2023.
  34. M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Y. Ng et al., “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2.   Kobe, Japan, 2009, p. 5.
  35. S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics.   JMLR Workshop and Conference Proceedings, 2011, pp. 627–635.
  36. X. Puig, K. Ra, M. Boben, J. Li, T. Wang, S. Fidler, and A. Torralba, “Virtualhome: Simulating household activities via programs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8494–8502.
  37. M. Wise, M. Ferguson, D. King, E. Diehr, and D. Dymesich, “Fetch and freight: Standard platforms for service robot applications,” in Workshop on autonomous mobile service robots, 2016, pp. 1–6.
Citations (1)

Summary

  • The paper introduces LLM-State, a novel dual-faceted state representation model that continuously updates object states for enhanced task planning.
  • It integrates structured object entries with unstructured historical summaries to effectively monitor and adapt to dynamic environments.
  • Experiments demonstrate that LLM-State significantly outperforms baselines in complex household tasks, improving robotic decision-making.

Introduction

In robotics, a common challenge is to plan and execute tasks in dynamic and unpredictable environments, such as a typical household. This task planning process becomes even more complex when robots must operate over long time horizons with the need to interact with various objects, each with changing attributes. LLMs have been leveraged to help robots navigate these tasks by providing common-sense reasoning and high-level decision-making support. However, traditional use of LLMs in task planning often confronts difficulties with keeping track of the evolving state of objects and fails to relate key attributes of the tasks effectively.

Methodology

To address these challenges, a novel state representation model, termed "LLM-State," has been proposed. This model utilizes LLMs for continuous monitoring and updating of an environment's state — particularly for objects and their attributes — through an open-world household setting. LLM-State distinguishes itself by combining structured object entries with an unstructured summary of historical data. Structured object entries in the LLM-State provide detailed insights into every object relevant to the task, enabling the LLM to dynamically adapt to changes in the environment by updating these entries. Meanwhile, the unstructured entry provides contextual information and historical summaries, aiding the LLM in understanding the evolution of the environment over time. This innovative dual-faceted representation is key for enhancing the robot's decision-making ability in complex, long-horizon planning tasks.

Evaluations and Results

The performance of this approach was thoroughly evaluated in simulated and real-world settings. The experiments spanned a range of household tasks with varying degrees of complexity, demonstrating that the LLM-State significantly outperforms baseline models, especially in long-horizon tasks. The analysis further showed that both structured object entries and the unstructured historical summary play critical roles in the success of task planning in dynamic environments. It is explicit tracking of object states, including previously undefined attributes, that enables robust decision-making and the successful execution of multi-step tasks.

Conclusion

The LLM-State model represents a significant advance in open-world task planning, where adaptability and real-time contextual understanding are paramount. By automating state representation and updates, and providing a comprehensive record of actions and changes, LLM-State effectively empowers robots to plan and execute tasks over longer horizons and within constantly changing environments. Future work could include enhancing the model to account for relationships between objects, further improving the task-planning capabilities of robotic systems using LLMs.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.