Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI (2109.08238v1)

Published 16 Sep 2021 in cs.CV and cs.AI

Abstract: We present the Habitat-Matterport 3D (HM3D) dataset. HM3D is a large-scale dataset of 1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each scene in the dataset consists of a textured 3D mesh reconstruction of interiors such as multi-floor residences, stores, and other private indoor spaces. HM3D surpasses existing datasets available for academic research in terms of physical scale, completeness of the reconstruction, and visual fidelity. HM3D contains 112.5k m² of navigable space, which is 1.4 - 3.7x larger than other building-scale datasets such as MP3D and Gibson. When compared to existing photorealistic 3D datasets such as Replica, MP3D, Gibson, and ScanNet, images rendered from HM3D have 20 - 85% higher visual fidelity w.r.t. counterpart images captured with real cameras, and HM3D meshes have 34 - 91% fewer artifacts due to incomplete surface reconstruction. The increased scale, fidelity, and diversity of HM3D directly impacts the performance of embodied AI agents trained using it. In fact, we find that HM3D is `pareto optimal' in the following sense -- agents trained to perform PointGoal navigation on HM3D achieve the highest performance regardless of whether they are evaluated on HM3D, Gibson, or MP3D. No similar claim can be made about training on other datasets. HM3D-trained PointNav agents achieve 100% performance on Gibson-test dataset, suggesting that it might be time to retire that episode dataset.

Citations (295)

View on Semantic Scholar

Summary

The paper introduces a dataset of 1000 photorealistic 3D indoor spaces that significantly expands navigable areas for embodied AI research.
The paper demonstrates a 34-91% reduction in reconstruction defects, ensuring highly complete and coherent training environments.
The paper shows up to 85% higher visual fidelity than previous datasets, enhancing AI agent performance and real-world generalization.

Habitat-Matterport 3D Dataset (HM3D): Advancing Embodied AI Research

The "Habitat-Matterport 3D Dataset (HM3D)" forms a notable contribution to the field of Embodied AI research, presenting a collection of 1,000 large-scale, photorealistic 3D reconstructions of indoor environments. The dataset is curated to offer a substantial advancement over prior datasets in terms of physical scale, computational completeness, and visual fidelity, providing more extensive and realistic environments for training and evaluating embodied AI agents.

Key Contributions and Comparisons

Scale and Complexity: HM3D surpasses existing datasets by providing a significantly larger navigable area, approximately 1.4 times larger than the previous largest dataset, Gibson. The comprehensive architectural layout, encompassing over 10,600 rooms across 1,920 building floors, offers higher structural complexity, beneficial for diverse embodied AI tasks such as navigation. The navigable space spreads over 112,500 square meters, providing ample environmental complexity.
Reconstruction Completeness: HM3D excels in minimizing reconstruction artifacts, showing a 34 to 91% reduction in defects related to incomplete surfaces compared to other datasets like Gibson and MP3D. This completeness means fewer discrepancies such as visible holes or cracks, thus ensuring a more coherent training and evaluation space for AI agents.
Visual Fidelity: Rendered images from HM3D exhibit 20 to 85% higher visual fidelity compared to traditional 3D datasets such as Replica and ScanNet. This level of realism enhances the training efficacy of embodied AI agents, potentially improving their generalization to real-world deployments.

Quantitative and Qualitative Evaluations

Through a number of quantitative analyses, the paper validates the superiority of HM3D in terms of visual quality and completeness. The dataset achieves better FID and KID scores compared to images from Gibson and MP3D real-world panoramas, highlighting its high visual fidelity. Moreover, the HM3D-trained PointGoal navigation agents demonstrate superior performance, achieving peak success rates and SPL scores in both within-dataset and cross-dataset evaluations. Notably, HM3D-trained agents recorded a 100% success rate on the Gibson-test dataset, highlighting the robustness and transferability of skills learned within HM3D environments.

Implications and Future Directions

The implications of deploying HM3D are significant for both practical applications and theoretical research within AI. Practically, the dataset accommodates more realistic and varied environments, potentially improving the versatility and applicability of AI navigational systems. Theoretically, the extensive scale and quality of HM3D permit researchers to explore complex embodied AI tasks such as multi-room navigation and dynamic object interaction within a highly controlled setting.

In future work, adding semantic and dynamic attributes to HM3D could open avenues for more sophisticated AI tasks, including object recognition and manipulation. As AI continues to integrate more seamlessly into real-world applications, such enhancements could bridge existing gaps between simulation and real-life operation, steering the field towards new breakthroughs in AI agent capabilities and applications.

Overall, HM3D sets a new standard for 3D datasets in AI, offering a rich resource for advancing the frontier of embodied intelligence in artificial agents.

PDF Markdown

Related Papers

YouTube

Show All Videos