Papers
Topics
Authors
Recent
Search
2000 character limit reached

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents

Published 30 Oct 2024 in cs.RO, cs.AI, and cs.MA | (2410.22662v2)

Abstract: Heterogeneous multi-robot systems (HMRS) have emerged as a powerful approach for tackling complex tasks that single robots cannot manage alone. Current large-language-model-based multi-agent systems (LLM-based MAS) have shown success in areas like software development and operating systems, but applying these systems to robot control presents unique challenges. In particular, the capabilities of each agent in a multi-robot system are inherently tied to the physical composition of the robots, rather than predefined roles. To address this issue, we introduce a novel multi-agent framework designed to enable effective collaboration among heterogeneous robots with varying embodiments and capabilities, along with a new benchmark named Habitat-MAS. One of our key designs is $\textit{Robot Resume}$: Instead of adopting human-designed role play, we propose a self-prompted approach, where agents comprehend robot URDF files and call robot kinematics tools to generate descriptions of their physics capabilities to guide their behavior in task planning and action execution. The Habitat-MAS benchmark is designed to assess how a multi-agent framework handles tasks that require embodiment-aware reasoning, which includes 1) manipulation, 2) perception, 3) navigation, and 4) comprehensive multi-floor object rearrangement. The experimental results indicate that the robot's resume and the hierarchical design of our multi-agent system are essential for the effective operation of the heterogeneous multi-robot system within this intricate problem context.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Do as i can, not as i say: Grounding language in robotic affordances, 2022.
  2. Advances in multi-robot systems. IEEE Transactions on robotics and automation, 18(5):655–661, 2002.
  3. Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV), 2017.
  4. Hydra-multi: Collaborative online construction of 3d scene graphs with multi-robot teams. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.  10995–11002. IEEE, 2023.
  5. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
  6. Dag-plan: Generating directed acyclic dependency graphs for dual-arm cooperative planning. arXiv preprint arXiv:2406.09953, 2024.
  7. Doremi: Grounding language model by detecting and recovering from plan-execution misalignment, 2023.
  8. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
  9. Tree-planner: Efficient close-loop task planning with large language models. arXiv preprint arXiv:2310.08582, 2023.
  10. Agentgen: Enhancing planning abilities for large language model based agent via environment and task generation. arXiv preprint arXiv:2408.00764, 2024.
  11. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  9118–9147. PMLR, 2022a. URL https://proceedings.mlr.press/v162/huang22a.html.
  12. Inner monologue: Embodied reasoning through planning with language models, 2022b.
  13. Grounded decoding: Guiding text generation with grounded models for robot control, 2023.
  14. Hydra: A real-time spatial perception system for 3D scene graph construction and optimization. In Robotics: Science and Systems (RSS), 2022.
  15. Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation. arXiv preprint, 2023.
  16. Towards cooperation of heterogeneous, autonomous robots: A case study of humanoid and wheeled robots. Robotics and Autonomous Systems, 58(7):921–929, 2010.
  17. Camel: Communicative agents for” mind” exploration of large language model society. Advances in Neural Information Processing Systems, 36, 2024.
  18. On grounded planning for embodied tasks with language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  13192–13200, 2023.
  19. Few-shot subgoal planning with language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  5493–5506, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.402. URL https://aclanthology.org/2022.naacl-main.402.
  20. Roco: Dialectic multi-robot collaboration with large language models. In 2024 IEEE International Conference on Robotics and Automation (ICRA), pp.  286–299. IEEE, 2024.
  21. Pddl-the planning domain definition language. 1998. URL https://api.semanticscholar.org/CorpusID:59656859.
  22. Llm agent operating system. arXiv preprint arXiv:2403.16971, 2024.
  23. Mikko Mononen. Recast navigation, 2009. URL https://github.com/recastnavigation/recastnavigation.
  24. Embodiedgpt: Vision-language pre-training via embodied chain of thought, 2023.
  25. Robocodex: Multimodal code generation for robotic behavior synthesis. arXiv preprint arXiv:2402.16117, 2024a.
  26. Robotwin: Dual-arm robot benchmark with generative digital twins (early version). arXiv preprint arXiv:2409.02920, 2024b.
  27. OpenAI. Hello gpt-4o, May 2024. URL https://openai.com/index/hello-gpt-4o.
  28. Jun Ota. Multi-agent robot systems as distributed autonomous systems. Advanced engineering informatics, 20(1):59–70, 2006.
  29. Habitat 3.0: A co-habitat for humans, avatars and robots. arXiv preprint arXiv:2310.13724, 2023.
  30. Cooperative heterogeneous multi-robot systems: A survey. ACM Computing Surveys (CSUR), 52(2):1–31, 2019.
  31. Heterogeneous multi-robot system for mapping environmental variables of greenhouses. Sensors, 16(7):1018, 2016.
  32. Languagempc: Large language models as decision makers for autonomous driving. arXiv preprint arXiv:2310.03026, 2023.
  33. Progprompt: Generating situated robot task plans using large language models, 2022.
  34. Llm-planner: Few-shot grounded planning for embodied agents with large language models, 2023.
  35. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8:345–383, 2000.
  36. Chatgpt for robotics: Design principles and model abilities. Microsoft Autonomous Systems and Robotics Research, 2023.
  37. Multi-on: Benchmarking semantic map memory using multi-object navigation. In Neural Information Processing Systems (NeurIPS), 2020.
  38. Voronav: Voronoi-based zero-shot object navigation with large language model. arXiv preprint arXiv:2401.02695, 2024.
  39. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023a.
  40. Embodied task planning with large language models, 2023b.
  41. Crab: Cross-environment agent benchmark for multimodal language model agents. arXiv preprint arXiv:2407.01511, 2024.
  42. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  43. Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485, 2023.
  44. Large language models as commonsense knowledge for large-scale task planning, 2023.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.