Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots (2405.15646v1)

Published 24 May 2024 in cs.RO

Abstract: The development of a general purpose service robot for daily life necessitates the robot's ability to deploy a myriad of fundamental behaviors judiciously. Recent advancements in training LLMs can be used to generate action sequences directly, given an instruction in natural language with no additional domain information. However, while the outputs of LLMs are semantically correct, the generated task plans may not accurately map to acceptable actions and might encompass various linguistic ambiguities. LLM hallucinations pose another challenge for robot task planning, which results in content that is inconsistent with real-world facts or user inputs. In this paper, we propose a task planning method based on a constrained LLM prompt scheme, which can generate an executable action sequence from a command. An exceptional handling module is further proposed to deal with LLM hallucinations problem. This module can ensure the LLM-generated results are admissible in the current environment. We evaluate our method on the commands generated by the RoboCup@Home Command Generator, observing that the robot demonstrates exceptional performance in both comprehending instructions and executing tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Robocup@home command generator. https://github.com/kyordhel/GPSRCmdGen. Accessed 1-January-2024.
  2. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  3. Meta AI. Llama. https://llamacargo.com/. Accessed 5-January-2024.
  4. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association for Computational Linguistics, 1:49–62, 2013.
  5. Baidu. Ernie-bot 4.0. https://yiyan.baidu.com/welcome. Accessed 5-January-2024.
  6. The smach high-level executive [ros news]. IEEE Robotics & Automation Magazine, 17(4):18–20, 2010.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  9. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232, 2023.
  10. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
  11. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
  12. iFLYTEK. iflytek spark. https://xinghuo.xfyun.cn/. Accessed 5-January-2024.
  13. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  14. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems, 35:31199–31212, 2022.
  15. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
  16. Environment-driven lexicon induction for high-level instructions. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 992–1002, 2015.
  17. Tell me dave: Context-sensitive grounding of natural language to manipulation instructions. The International Journal of Robotics Research, 35(1-3):281–300, 2016.
  18. Foundation model based open vocabulary task planning and executive system for general purpose service robots. arXiv preprint arXiv:2308.03357, 2023.
  19. Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8494–8502, 2018.
  20. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10740–10749, 2020.
  21. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530. IEEE, 2023.
  22. Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on robot learning, pages 477–490. PMLR, 2022.
  23. Hobo Technology. Tigerbot. https://tigerbot.com/. Accessed 5-January-2024.
  24. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7464–7475, 2023.
  25. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ruoyu Wang (95 papers)
  2. Zhipeng Yang (26 papers)
  3. Zinan Zhao (6 papers)
  4. Xinyan Tong (2 papers)
  5. Zhi Hong (14 papers)
  6. Kun Qian (87 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com