ProcTHOR: Large-Scale Embodied AI Using Procedural Generation (2206.06994v1)

Published 14 Jun 2022 in cs.AI, cs.CV, and cs.RO

Abstract: Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks. We demonstrate the power and potential of ProcTHOR via a sample of 10,000 generated houses and a simple neural model. Models trained using only RGB images on ProcTHOR, with no explicit mapping and no human task supervision produce state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the presently running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We also demonstrate strong 0-shot results on these benchmarks, via pre-training on ProcTHOR with no fine-tuning on the downstream benchmark, often beating previous state-of-the-art systems that access the downstream training data.

PDF Abstract

Overview of "ProcTHOR: Large-Scale Embodied AI Using Procedural Generation"

This paper introduces a novel platform for advancing Embodied AI by leveraging procedural generation to create large-scale and diverse virtual environments. Authored by researchers at the Allen Institute for AI, the framework, termed ProcTHOR, facilitates the generation of interactive, customizable, and high-performance virtual environments which significantly surpass the scale of traditional handcrafted or 3D-scanned environments. ProcTHOR aims to facilitate better training and evaluation of embodied AI agents, specifically in tasks requiring navigation, interaction, and manipulation.

The core contributions of this work lie in:

Scalable Environment Generation: ProcTHOR employs procedural generation techniques to create environments with diverse floor plans, asset placements, materials, and lighting. This allows for the on-demand generation of an essentially unbounded number of unique training environments.
Comprehensive Asset Library: The platform curates a vast library of 108 object types with over 1600 interactable instances, enhancing the diversity and realism of the generated scenes.
State-of-the-art Performance: Agents trained using ProcTHOR demonstrated state-of-the-art results across several benchmarks for Embodied AI tasks. Notably, the agents achieved strong zero-shot performance, outperforming prior methods even without fine-tuning on downstream tasks.

Key Numerical Findings

ProcTHOR enables the creation of datasets exemplified by 10,000 uniquely generated houses used for training embodied agents.
Agents trained on ProcTHOR, utilizing only RGB images, achieved significant performance improvements in benchmarks for navigation and manipulation. For instance, in the RoboTHOR ObjectNav Challenge, models pre-trained on ProcTHOR's environments improved SPL (Success weighted by Path Length) by 8.8 points over previous state-of-the-art systems that trained on RoboTHOR scenes.

Implications and Future Directions

The procedural generation approach of ProcTHOR addresses critical limitations faced by current Embodied AI platforms, which typically suffer from limited scene diversity and scalability. By allowing the creation of a vast number of environments, ProcTHOR aids in combating overfitting and improves generalization capabilities of AI agents.

The introduction of ProcTHOR suggests several promising research directions:

Data-Efficient Learning: With a plethora of diverse environments, researchers can better explore and develop data-efficient learning paradigms, potentially reducing the need for large amounts of labeled data.
Zero-Shot and Transfer Learning: The strong zero-shot performance observed in this paper indicates that ProcTHOR can be a valuable tool for research into transfer learning and generalization across varied tasks and environments.
Further Scaling and Complexity: Future work could explore augmenting ProcTHOR with dynamic and multi-agent scenarios, further increasing complexity to bridge the gap between simulation and real-world applications.

In conclusion, ProcTHOR represents a significant advancement in the field of Embodied AI, providing an extensive and flexible platform to enhance the training and evaluation of AI agents. The approach opens up robust avenues for future research aimed at achieving more generalized and adaptable AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (11)

Matt Deitke (11 papers)
Eli VanderBilt (10 papers)
Alvaro Herrasti (11 papers)
Luca Weihs (46 papers)
Jordi Salvador (15 papers)
Kiana Ehsani (31 papers)
Winson Han (11 papers)
Eric Kolve (13 papers)
Ali Farhadi (138 papers)
Aniruddha Kembhavi (79 papers)
Roozbeh Mottaghi (66 papers)

Citations (183)

View on Semantic Scholar

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation (2206.06994v1)

Overview of "ProcTHOR: Large-Scale Embodied AI Using Procedural Generation"

Key Numerical Findings

Implications and Future Directions

Related Papers