SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World (2312.02976v2)

Published 5 Dec 2023 in cs.RO, cs.AI, and cs.CV

Abstract: Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires extensive reward shaping and auxiliary losses and is often too slow and ineffective for long-horizon tasks. While IL with human supervision is effective, collecting human trajectories at scale is extremely expensive. In this work, we show that imitating shortest-path planners in simulation produces agents that, given a language instruction, can proficiently navigate, explore, and manipulate objects in both simulation and in the real world using only RGB sensors (no depth map or GPS coordinates). This surprising result is enabled by our end-to-end, transformer-based, SPOC architecture, powerful visual encoders paired with extensive image augmentation, and the dramatic scale and diversity of our training data: millions of frames of shortest-path-expert trajectories collected inside approximately 200,000 procedurally generated houses containing 40,000 unique 3D assets. Our models, data, training code, and newly proposed 10-task benchmarking suite CHORES are available in https://spoc-robot.github.io.

PDF Abstract

Introduction to Embodied Agents

Embodied agents are AI models that can perceive and act in simulated or real environments. They are often used for tasks that require interaction with physical spaces, such as navigation and object manipulation. Training these agents typically involves either Reinforcement Learning (RL), which requires extensive reward shaping and is slow, or Imitation Learning (IL), where the agent learns from human demonstrations, which can be expensive to collect. This paper offers an alternative approach: using simulation-generated shortest path trajectories.

Shortest Path Imitation for Effective Agents

The paper explores the surprising effectiveness of agents trained by imitating expert-designed shortest paths in simulations. The researchers present SPOC (Shortest Path Oracle Clone), which can navigate and manipulate objects proficiently based on RGB sensor input alone, without reliance on depth maps or GPS coordinates. The performance of SPOC is enhanced by its cutting-edge transformer-based architecture and powerful visual encoders, as well as extensive image augmentation techniques.

Diverse Training Environments and Data

A critical factor in SPOC's performance is the diversity and scale of the training data. The training involved millions of frames of shortest-path expert trajectories collected from around 200,000 procedurally generated houses containing tens of thousands of unique 3D assets. This breadth of data provides robustness that allows the trained agent to generalize across various environments.

Evaluation and Real-World Transfer

SPOC demonstrates its capabilities by achieving high success rates in both simulation environments and the real world without additional adaptation. Remarkably, when training to imitate shortest path navigations to locate objects, SPOC exhibits sophisticated behavior: exploration, obstacle avoidance, and backtracking, despite not explicitly learning these during training. This shows that the model can understand and adapt to novel scenarios.

Conclusion

The findings from this paper showcase the potential of IL in simulators for developing capable robots that can operate in real-world settings. With the positive results indicating that scale and diversity in simulation data are crucial, the paper suggests the viability of further scaling up this approach to tackle even more challenging and diverse tasks in embodied AI.

PDF Markdown Bookmark Chat (Pro)

Authors (14)

Kiana Ehsani (31 papers)
Tanmay Gupta (23 papers)
Rose Hendrix (12 papers)
Jordi Salvador (15 papers)
Luca Weihs (46 papers)
Kuo-Hao Zeng (22 papers)
Kunal Pratap Singh (7 papers)
Yejin Kim (35 papers)
Winson Han (11 papers)
Alvaro Herrasti (11 papers)
Ranjay Krishna (116 papers)
Dustin Schwenk (15 papers)
Eli VanderBilt (10 papers)
Aniruddha Kembhavi (79 papers)

Citations (6)

View on Semantic Scholar