HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D Reconstruction (2206.12356v1)

Published 24 Jun 2022 in cs.CV

Abstract: Reconstructing 3D objects is an important computer vision task that has wide application in AR/VR. Deep learning algorithm developed for this task usually relies on an unrealistic synthetic dataset, such as ShapeNet and Things3D. On the other hand, existing real-captured object-centric datasets usually do not have enough annotation to enable supervised training or reliable evaluation. In this technical report, we present a photo-realistic object-centric dataset HM3D-ABO. It is constructed by composing realistic indoor scene and realistic object. For each configuration, we provide multi-view RGB observations, a water-tight mesh model for the object, ground truth depth map and object mask. The proposed dataset could also be useful for tasks such as camera pose estimation and novel-view synthesis. The dataset generation code is released at https://github.com/zhenpeiyang/HM3D-ABO.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a photorealistic dataset that combines HM3D indoor scenes with detailed ABO object models to advance multi-view 3D reconstruction.
It details a comprehensive pipeline generating high-quality mesh models, ground truth depth maps, and object masks for robust supervised learning.
Benchmarking reveals that incorporating accurate camera pose estimation methods improves reconstruction precision in realistic AR/VR scenarios.

Comprehensive Analysis of HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D Reconstruction

The paper "HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D Reconstruction" addresses a prominent challenge in computer vision: the reconstruction of 3D objects. This task is significant for augmented reality (AR) and virtual reality (VR) applications. The authors present a dataset that aims to bridge the gap between the synthetic environments often used in training deep learning models and realistic scenarios encountered in practical applications.

Dataset Composition and Assets

The HM3D-ABO dataset is constructed utilizing high-quality 3D asset sources: the Habitat-Matterport 3D (HM3D) for realistic indoor scenes and the Amazon-Berkeley Objects (ABO) dataset for detailed object models. This combination provides a rich source of multi-view RGB observations. For each setup, the dataset includes water-tight mesh models, ground truth depth maps, and object masks, making it suitable for supervised learning tasks. The watertight mesh models are particularly relevant for ensuring accurate 3D representations in reconstructions.

Dataset Characteristics and Generation

The model selection from the HM3D and ABO datasets was guided by a careful alignment with real-world scenarios, ensuring both the scenes and objects are highly realistic. The authors implemented a meticulous pipeline to position objects realistically within scenes, taking care to avoid physical collisions and ensuring objects are placed effectively, such as being grounded and not mid-air. Furthermore, each scene-object configuration is assessed for quality using a ratio calculation that filters out deficient images based on their bounding rectangle area relative to the image size.

This dataset houses 1,966 objects spread across 500 indoor environments, amounting to 3,196 scene-object configurations, which are rendered with realistic lighting and textures using Blender’s physically-based rendering engine.

Evaluation and Benchmarking

The paper benchmarks the HM3D-ABO dataset on two specific tasks: absolute camera pose estimation and few-view 3D object reconstruction. Pose estimation approaches include relative pose methods, reliant on feature matching, and absolute pose estimation that anchors coordinates directly to object canonical systems. The experiments highlighted in the paper show FvOR-Pose outperforming alternative methods with lower error rates in pixel, rotation, and translation metrics.

For 3D object reconstruction, the paper evaluates models both with known and predicted camera poses, including methods like OccNet, IDR, and FvOR. The benchmarks show that methods incorporating camera pose information, like FvOR, could achieve higher accuracy than those that don't, especially under realistic noise levels in camera pose estimation.

Significance and Future Directions

This dataset introduces high realism in 3D construction tasks owing to its combination of realistic object configurations and detailed environmental setups. The implications for AR/VR and computational photography are expansive, with potential applications ranging from enhanced virtual interaction to robust autonomous navigation systems.

While the dataset advances the state of synthetically generated photorealistic training environments, the authors acknowledge limitations such as a dominant presence of particular object classes and simplified scene arrangement strategies. Future expansions could address these by introducing a wider variety of objects and more complex scene configurations to simulate crowded environments.

In conclusion, the HM3D-ABO dataset forms a crucial step forward by offering a richly annotated resource that lies closer to real-world complexities than earlier synthetic datasets. This work is poised to become a cornerstone for future research and development in computer vision spheres, enhancing the realism and reliability of 3D reconstructions.

PDF Markdown

Related Papers

GitHub

GitHub - zhenpeiyang/HM3D-ABO (87 stars)