Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots (2310.13724v1)
Abstract: We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real human interaction with simulated robots via mouse/keyboard or a VR interface, facilitating evaluation of robot policies with human input. (3) Collaborative tasks: studying two collaborative tasks, Social Navigation and Social Rearrangement. Social Navigation investigates a robot's ability to locate and follow humanoid avatars in unseen environments, whereas Social Rearrangement addresses collaboration between a humanoid and robot while rearranging a scene. These contributions allow us to study end-to-end learned and heuristic baselines for human-robot collaboration in-depth, as well as evaluate them with humans in the loop. Our experiments demonstrate that learned robot policies lead to efficient task completion when collaborating with unseen humanoid agents and human partners that might exhibit behaviors that the robot has not seen before. Additionally, we observe emergent behaviors during collaborative task execution, such as the robot yielding space when obstructing a humanoid agent, thereby allowing the effective completion of the task by the humanoid agent. Furthermore, our experiments using the human-in-the-loop tool demonstrate that our automated evaluation with humanoids can provide an indication of the relative ordering of different policies when evaluated with real human collaborators. Habitat 3.0 unlocks interesting new features in simulators for Embodied AI, and we hope it paves the way for a new frontier of embodied human-AI interaction capabilities.
- Openai five. https://openai.com/research/openai-five.
- Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In CVPR, 2017.
- On evaluation of embodied navigation agents. arXiv, 2018.
- Sim-to-real transfer for vision-and-language navigation. In CoRL, 2020.
- Emergent tool use from multi-agent autocurricula. In ICLR, 2020.
- Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 2015.
- Objectnav revisited: On evaluation of embodied agents navigating to objects. arXiv, 2020.
- Socnavbench: A grounded simulation testing framework for evaluating social navigation. THRI, 2021.
- James V Bradley. Complete counterbalancing of immediate sequential effects in a latin square design. Journal of the American Statistical Association, 1958.
- Rt-1: Robotics transformer for real-world control at scale. In RSS, 2023.
- Yale-cmu-berkeley dataset for robotic manipulation research. IJRR, 2017.
- On the utility of learning about humans for human-ai coordination. In NeurIPS, 2019.
- Object goal navigation using goal-oriented semantic exploration. In NeurIPS, 2020.
- Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In ICRA, 2018.
- Visual hide and seek. arXiv, 2019.
- SoundSpaces: Audio-visual navigation in 3d environments. In ECCV, 2020.
- Embodied question answering. In CVPR, 2018.
- Robothor: An open simulation-to-real embodied ai platform. In CVPR, 2020.
- ProcTHOR: Large-Scale Embodied AI Using Procedural Generation. In NeurIPS, 2022.
- ManipulaTHOR: A Framework for Visual Object Manipulation. In CVPR, 2021.
- Nano: Nested human-in-the-loop reward learning for few-shot language model control. arXiv, 2022.
- Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In NeurIPS, 2016.
- Principles and guidelines for evaluating social robot navigation algorithms. arXiv, 2023.
- Threedworld: A platform for interactive multi-modal physical simulation. In NeurIPS Datasets and Benchmarks Track, 2021.
- Vrkitchen: an interactive 3d virtual environment for task-oriented learning. arXiv, 2019.
- Navigating to objects in the real world. Science Robotics, 2023.
- Learning communication for multi-agent systems. In Proc. Innovative Concepts for Agent-Based Systems, 2002.
- Multi-skill mobile manipulation for object rearrangement. In ICLR, 2023.
- Deep residual learning for image recognition. In CVPR, 2016.
- Long short-term memory. Neural computation, 1997.
- “other-play” for zero-shot coordination. In ICML, 2020.
- Inner monologue: Embodied reasoning through planning with language models. In CoRL, 2022.
- Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 2019.
- Two body problem: Collaborative visual task completion. In CVPR, 2019.
- A cordial sync: Going beyond marginal policies for multi-agent embodied tasks. In ECCV, 2020.
- Gridtopix: Training embodied agents with minimal supervision. In ICCV, 2021.
- Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 2020.
- Unity: A general platform for intelligent agents. arXiv, 2018.
- Maurits Kaptein. Using generalized linear (mixed) models in hci. Modern Statistical Methods for HCI, 2016.
- Spherical blend skinning: a real-time deformation of articulated models. In Symposium on Interactive 3D Graphics and Games, 2005.
- Skinning with dual quaternions. In Symposium on Interactive 3D Graphics and Games, 2007.
- Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation. arXiv, 2023.
- Adam: A method for stochastic optimization. arXiv, 2014.
- Ai2-thor: An interactive 3d environment for visual ai. arXiv, 2017.
- Entl: Embodied navigation trajectory learner. In ICCV, 2023.
- Beyond the nav-graph: Vision and language navigation in continuous environments. In ECCV, 2020.
- Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding. In EMNLP, 2020.
- Rma: Rapid motor adaptation for legged robots. In RSS, 2021.
- Google research football: A novel reinforcement learning environment. In AAAI, 2020.
- Multi-agent cooperation and the emergence of (natural) language. arXiv, 2016.
- Hrl4in: Hierarchical reinforcement learning for interactive navigation with mobile manipulators. In CoRL, 2019.
- iGibson Challenge 2021. https://svl.stanford.edu/igibson/challenge.html, 2021a.
- igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In CoRL, 2021b.
- Interactive learning from policy-dependent human feedback. In ICML, 2017.
- Amass: Archive of motion capture as surface shapes. In ICCV, 2019.
- Emergence of Grounded Compositional Language in Multi-Agent Populations. In AAAI, 2018.
- ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations. In NeurIPS Datasets and Benchmarks Track, 2021.
- Teach: Task-driven embodied agents that chat. In AAAI, 2021.
- Interpretation of emergent communication in heterogeneous collaborative embodied agents. In ICCV, 2021.
- Expressive body capture: 3D hands, face, and body from a single image. In CVPR, 2019.
- Virtualhome: Simulating household activities via programs. In CVPR, 2018.
- Watch-and-help: A challenge for social perception and human-AI collaboration. In ICLR, 2021.
- Nopa: Neurally-guided online probabilistic assistance for building socially intelligent home assistants. In ICRA, 2023.
- Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai. In NeurIPS Datasets and Benchmarks Track, 2021.
- Habitat-web: Learning embodied object-search strategies from human demonstrations at scale. In CVPR, 2022.
- Pommerman: A multi-agent playground. arXiv, 2018.
- Rmm: A recursive mental model for dialog navigation. In EMNLP Findings, 2020.
- The starcraft multi-agent challenge. arXiv, 2019.
- Habitat: A platform for embodied ai research. In ICCV, 2019.
- Vint: A foundation model for visual navigation. arXiv, 2023.
- igibson, a simulation environment for interactive tasks in large realistic scenes. In IROS, 2021.
- Spot. Spot robot. https://www.bostondynamics.com/products/spot.
- Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In CoRL, 2021.
- Neural mmo: A massively multiagent game environment for training and evaluating intelligent agents. arXiv, 2019.
- Learning multiagent communication with backpropagation. In NeurIPS, 2016.
- Habitat 2.0: Training home assistants to rearrange their habitat. In NeurIPS, 2021.
- Adaptive coordination for social embodied rearrangement. In ICML, 2023.
- Human motion diffusion model. In ICLR, 2023.
- Vision-and-dialog navigation. In CoRL, 2020.
- Rethinking sim2real: Lower fidelity simulation leads to higher sim2real transfer in navigation. In CoRL, 2022.
- Sean 2.0: Formalizing and generating social situations for robot navigation. IEEE Robotics and Automation Letters, 2022.
- Co-gail: Learning diverse strategies for human-robot collaboration. In CoRL, 2022.
- Multi-ON: Benchmarking Semantic Map Memory using Multi-Object Navigation. In NeurIPS, 2020.
- Allenact: A framework for embodied ai research. arXiv, 2020.
- Bridging the imitation gap by adaptive insubordination. In NeurIPS, 2021a.
- Learning generalizable visual representations via interactive gameplay. In ICLR, 2021b.
- Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames. In ICLR, 2019.
- Learning to learn how to learn: Self-adaptive visual navigation using meta-learning. In CVPR, 2019.
- Gibson env: Real-world perception for embodied agents. In CVPR, 2018.
- Interactive gibson benchmark: A benchmark for interactive navigation in cluttered environments. RA-L, 2020.
- SAPIEN: A SimulAted Part-based Interactive ENvironment. In CVPR, 2020.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In CoRL, 2019.
- Generating manga from illustrations via mimicking manga creation workflow. In CVPR, 2021.
- Target-driven visual navigation in indoor scenes using deep reinforcement learning. In ICRA, 2017.
- robosuite: A modular simulation framework and benchmark for robot learning. In arXiv, 2020.