Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots (2310.13724v1)

Published 19 Oct 2023 in cs.HC, cs.AI, cs.CV, cs.GR, cs.MA, and cs.RO

Abstract: We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real human interaction with simulated robots via mouse/keyboard or a VR interface, facilitating evaluation of robot policies with human input. (3) Collaborative tasks: studying two collaborative tasks, Social Navigation and Social Rearrangement. Social Navigation investigates a robot's ability to locate and follow humanoid avatars in unseen environments, whereas Social Rearrangement addresses collaboration between a humanoid and robot while rearranging a scene. These contributions allow us to study end-to-end learned and heuristic baselines for human-robot collaboration in-depth, as well as evaluate them with humans in the loop. Our experiments demonstrate that learned robot policies lead to efficient task completion when collaborating with unseen humanoid agents and human partners that might exhibit behaviors that the robot has not seen before. Additionally, we observe emergent behaviors during collaborative task execution, such as the robot yielding space when obstructing a humanoid agent, thereby allowing the effective completion of the task by the humanoid agent. Furthermore, our experiments using the human-in-the-loop tool demonstrate that our automated evaluation with humanoids can provide an indication of the relative ordering of different policies when evaluated with real human collaborators. Habitat 3.0 unlocks interesting new features in simulators for Embodied AI, and we hope it paves the way for a new frontier of embodied human-AI interaction capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (95)
  1. Openai five. https://openai.com/research/openai-five.
  2. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In CVPR, 2017.
  3. On evaluation of embodied navigation agents. arXiv, 2018.
  4. Sim-to-real transfer for vision-and-language navigation. In CoRL, 2020.
  5. Emergent tool use from multi-agent autocurricula. In ICLR, 2020.
  6. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 2015.
  7. Objectnav revisited: On evaluation of embodied agents navigating to objects. arXiv, 2020.
  8. Socnavbench: A grounded simulation testing framework for evaluating social navigation. THRI, 2021.
  9. James V Bradley. Complete counterbalancing of immediate sequential effects in a latin square design. Journal of the American Statistical Association, 1958.
  10. Rt-1: Robotics transformer for real-world control at scale. In RSS, 2023.
  11. Yale-cmu-berkeley dataset for robotic manipulation research. IJRR, 2017.
  12. On the utility of learning about humans for human-ai coordination. In NeurIPS, 2019.
  13. Object goal navigation using goal-oriented semantic exploration. In NeurIPS, 2020.
  14. Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In ICRA, 2018.
  15. Visual hide and seek. arXiv, 2019.
  16. SoundSpaces: Audio-visual navigation in 3d environments. In ECCV, 2020.
  17. Embodied question answering. In CVPR, 2018.
  18. Robothor: An open simulation-to-real embodied ai platform. In CVPR, 2020.
  19. ProcTHOR: Large-Scale Embodied AI Using Procedural Generation. In NeurIPS, 2022.
  20. ManipulaTHOR: A Framework for Visual Object Manipulation. In CVPR, 2021.
  21. Nano: Nested human-in-the-loop reward learning for few-shot language model control. arXiv, 2022.
  22. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In NeurIPS, 2016.
  23. Principles and guidelines for evaluating social robot navigation algorithms. arXiv, 2023.
  24. Threedworld: A platform for interactive multi-modal physical simulation. In NeurIPS Datasets and Benchmarks Track, 2021.
  25. Vrkitchen: an interactive 3d virtual environment for task-oriented learning. arXiv, 2019.
  26. Navigating to objects in the real world. Science Robotics, 2023.
  27. Learning communication for multi-agent systems. In Proc. Innovative Concepts for Agent-Based Systems, 2002.
  28. Multi-skill mobile manipulation for object rearrangement. In ICLR, 2023.
  29. Deep residual learning for image recognition. In CVPR, 2016.
  30. Long short-term memory. Neural computation, 1997.
  31. “other-play” for zero-shot coordination. In ICML, 2020.
  32. Inner monologue: Embodied reasoning through planning with language models. In CoRL, 2022.
  33. Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 2019.
  34. Two body problem: Collaborative visual task completion. In CVPR, 2019.
  35. A cordial sync: Going beyond marginal policies for multi-agent embodied tasks. In ECCV, 2020.
  36. Gridtopix: Training embodied agents with minimal supervision. In ICCV, 2021.
  37. Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 2020.
  38. Unity: A general platform for intelligent agents. arXiv, 2018.
  39. Maurits Kaptein. Using generalized linear (mixed) models in hci. Modern Statistical Methods for HCI, 2016.
  40. Spherical blend skinning: a real-time deformation of articulated models. In Symposium on Interactive 3D Graphics and Games, 2005.
  41. Skinning with dual quaternions. In Symposium on Interactive 3D Graphics and Games, 2007.
  42. Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation. arXiv, 2023.
  43. Adam: A method for stochastic optimization. arXiv, 2014.
  44. Ai2-thor: An interactive 3d environment for visual ai. arXiv, 2017.
  45. Entl: Embodied navigation trajectory learner. In ICCV, 2023.
  46. Beyond the nav-graph: Vision and language navigation in continuous environments. In ECCV, 2020.
  47. Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding. In EMNLP, 2020.
  48. Rma: Rapid motor adaptation for legged robots. In RSS, 2021.
  49. Google research football: A novel reinforcement learning environment. In AAAI, 2020.
  50. Multi-agent cooperation and the emergence of (natural) language. arXiv, 2016.
  51. Hrl4in: Hierarchical reinforcement learning for interactive navigation with mobile manipulators. In CoRL, 2019.
  52. iGibson Challenge 2021. https://svl.stanford.edu/igibson/challenge.html, 2021a.
  53. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In CoRL, 2021b.
  54. Interactive learning from policy-dependent human feedback. In ICML, 2017.
  55. Amass: Archive of motion capture as surface shapes. In ICCV, 2019.
  56. Emergence of Grounded Compositional Language in Multi-Agent Populations. In AAAI, 2018.
  57. ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations. In NeurIPS Datasets and Benchmarks Track, 2021.
  58. Teach: Task-driven embodied agents that chat. In AAAI, 2021.
  59. Interpretation of emergent communication in heterogeneous collaborative embodied agents. In ICCV, 2021.
  60. Expressive body capture: 3D hands, face, and body from a single image. In CVPR, 2019.
  61. Virtualhome: Simulating household activities via programs. In CVPR, 2018.
  62. Watch-and-help: A challenge for social perception and human-AI collaboration. In ICLR, 2021.
  63. Nopa: Neurally-guided online probabilistic assistance for building socially intelligent home assistants. In ICRA, 2023.
  64. Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai. In NeurIPS Datasets and Benchmarks Track, 2021.
  65. Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. In CVPR, 2022.
  66. Pommerman: A multi-agent playground. arXiv, 2018.
  67. Rmm: A recursive mental model for dialog navigation. In EMNLP Findings, 2020.
  68. The starcraft multi-agent challenge. arXiv, 2019.
  69. Habitat: A platform for embodied ai research. In ICCV, 2019.
  70. Vint: A foundation model for visual navigation. arXiv, 2023.
  71. igibson, a simulation environment for interactive tasks in large realistic scenes. In IROS, 2021.
  72. Spot. Spot robot. https://www.bostondynamics.com/products/spot.
  73. Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In CoRL, 2021.
  74. Neural mmo: A massively multiagent game environment for training and evaluating intelligent agents. arXiv, 2019.
  75. Learning multiagent communication with backpropagation. In NeurIPS, 2016.
  76. Habitat 2.0: Training home assistants to rearrange their habitat. In NeurIPS, 2021.
  77. Adaptive coordination for social embodied rearrangement. In ICML, 2023.
  78. Human motion diffusion model. In ICLR, 2023.
  79. Vision-and-dialog navigation. In CoRL, 2020.
  80. Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation. In CoRL, 2022.
  81. Sean 2.0: Formalizing and generating social situations for robot navigation. IEEE Robotics and Automation Letters, 2022.
  82. Co-gail: Learning diverse strategies for human-robot collaboration. In CoRL, 2022.
  83. Multi-ON: Benchmarking Semantic Map Memory using Multi-Object Navigation. In NeurIPS, 2020.
  84. Allenact: A framework for embodied ai research. arXiv, 2020.
  85. Bridging the imitation gap by adaptive insubordination. In NeurIPS, 2021a.
  86. Learning generalizable visual representations via interactive gameplay. In ICLR, 2021b.
  87. Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. In ICLR, 2019.
  88. Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In CVPR, 2019.
  89. Gibson env: Real-world perception for embodied agents. In CVPR, 2018.
  90. Interactive gibson benchmark: A benchmark for interactive navigation in cluttered environments. RA-L, 2020.
  91. SAPIEN: A SimulAted Part-based Interactive ENvironment. In CVPR, 2020.
  92. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In CoRL, 2019.
  93. Generating manga from illustrations via mimicking manga creation workflow. In CVPR, 2021.
  94. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In ICRA, 2017.
  95. robosuite: A modular simulation framework and benchmark for robot learning. In arXiv, 2020.
Citations (84)

Summary

  • The paper introduces Habitat 3.0, a simulation platform that accurately models humanoid avatars and enables human-in-the-loop evaluations for collaborative home tasks.
  • The paper demonstrates a novel integration of realistic humanoid simulation and interactive interfaces, facilitating precise assessment of social navigation and rearrangement tasks.
  • The paper’s experiments reveal that learned robot policies adapt effectively to dynamic environments, paving the way for significant advancements in embodied AI.

Habitat 3.0: An Advanced Platform for Human-Robot Collaboration in Simulation

This paper introduces Habitat 3.0, a sophisticated simulation platform aimed at fostering advanced studies of collaborative human-robot tasks within home environments. The platform marks advancements across three primary dimensions: accurate humanoid simulation, human-in-the-loop infrastructure, and collaborative task evaluation. It is specifically designed to address longstanding challenges in modeling complex deformable bodies and diverse appearances and motions, while maintaining high simulation speeds.

Habitat 3.0 is characterized by its innovative humanoid simulation capabilities, which include articulated skeletons with rotational joints and high-fidelity rendering through surface skin mesh integration. The platform employs parameterized body models via SMPL-X to generate realistic body shapes and poses, enhancing humanoid appearance with a diverse library of avatars. Moreover, the platform incorporates a motion and behavior generation policy that supports programmatic control of avatars for navigation and object interaction.

The human-in-the-loop infrastructure uniquely facilitates real human interaction with simulated robots using mouse/keyboard interfaces or VR technology. This feature is crucial for evaluating robot policies with direct human input, thereby mimicking real-world collaborative dynamics that are essential for practical deployments in home environments.

Central to Habitat 3.0 are the social navigation and social rearrangement tasks. These tasks evaluate collaborative dynamics between humanoid avatars and robots. Social Navigation assesses a robot's ability to detect and thus safely follow humanoid avatars within unfamiliar environments. Social Rearrangement focuses on task division and cooperation during object rearrangement activities. Within these tasks, robots are trained using both learned and heuristic policies, gaining insights into human-robot teamwork efficiency.

The experiments conducted within Habitat 3.0 demonstrate significant success in learned robot policies that enhance efficiency in collaborative settings involving unseen humanoids and humans displaying previously unseen behaviors. Emergent behaviors were observed, such as robots yielding space to humanoid agents—a pivotal functionality for effective collaboration in constrained environments.

The results highlight the ability of the automated evaluation with humanoids to predict the relative efficiency of various policies when applied with real human collaborators. While the heuristic expert achieved notable success through privileged access to environment maps, the end-to-end reinforcement learning (RL) policy showcased its ability to adapt through learned behaviors, such as backing up when humanoids are nearby.

A comprehensive paper of human-in-the-loop evaluations with real humans underscores Habitat 3.0's utility in efficiently simulating realistic human-robot interactions. Despite promising results, the work acknowledges certain limitations, such as action scope limitations and challenges in observation space alignment between simulation and real-world interactions. Future directions may enhance simulation tasks beyond strictly non-communicative zero-shot coordination, integrating dialogue systems to further improve collaborative efficiency.

Habitat 3.0 paves the way for significant advancements in embodied AI, promising insights and breakthroughs in real-world applications where robots coexist and collaborate within dynamic human environments. As the platform continues to evolve, it holds the potential to enrich the discourse on embodied AI, offering robust tools for developing the next generation of assistive robotic technologies.

X Twitter Logo Streamline Icon: https://streamlinehq.com