Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild (2207.10660v2)

Published 21 Jul 2022 in cs.CV

Abstract: Recognizing scenes and objects in 3D from a single image is a longstanding goal of computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets and scalable solutions have led to unprecedented advances. In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e.g. urban driving scenes. Motivated by the success of 2D recognition, we revisit the task of 3D object detection by introducing a large benchmark, called Omni3D. Omni3D re-purposes and combines existing datasets resulting in 234k images annotated with more than 3 million instances and 98 categories. 3D detection at such scale is challenging due to variations in camera intrinsics and the rich diversity of scene and object types. We propose a model, called Cube R-CNN, designed to generalize across camera and scene types with a unified approach. We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks. Finally, we prove that Omni3D is a powerful dataset for 3D object recognition and show that it improves single-dataset performance and can accelerate learning on new smaller datasets via pre-training.

Authors (6)

Garrick Brazil (9 papers)
Abhinav Kumar (89 papers)
Julian Straub (23 papers)
Nikhila Ravi (15 papers)
Justin Johnson (56 papers)
Georgia Gkioxari (39 papers)

Citations (75)

View on Semantic Scholar

Summary

Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

The paper "Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild" presents a significant contribution to the field of computer vision, specifically in the domain of 3D object detection. The main contributions include the introduction of a diverse and extensive benchmark, Omni3D, and the proposal of a novel model, Cube R-CNN, designed to address the challenges of 3D object detection across varied scenes and object categories.

Omni3D Benchmark

Omni3D is an ambitious attempt to create a large-scale 3D object detection benchmark that surpasses existing datasets in size and diversity. It aggregates multiple datasets, including SUN RGB-D, ARKitScenes, Hypersim, Objectron, KITTI, and nuScenes, resulting in a consolidated dataset comprising 234,000 images annotated with over 3 million 3D instances across 98 categories. This vast dataset addresses the limitations of existing benchmarks, which tend to be constrained by domain-specific biases and limited categories.

The authors provide an insightful analysis of Omni3D, highlighting its diverse spatial and semantic characteristics. The dataset's construction involves careful normalization of coordinate systems and re-purposing of annotations to ensure consistency. Statistical analysis reveals key insights into object spatial distribution, providing a foundation for improved generalization across multiple domains.

Cube R-CNN Model

The paper introduces Cube R-CNN, a robust model designed to leverage the diversity of Omni3D. Cube R-CNN extends the popular Faster R-CNN framework with a novel 3D detection head. This model effectively predicts 3D cuboids by refining object detection using a carefully designed loss function and innovative concepts like virtual depth. Virtual depth mitigates the challenges introduced by varying camera intrinsics, thereby improving the model's generalization capabilities.

A notable aspect of Cube R-CNN is its performance enhancement and scalability, achieved through innovations such as virtual depth and 3D uncertainty handling. The use of disentangled loss functions aids in individually optimizing various 3D attributes, leading to superior detection performance compared to existing models.

Numerical Results and Claims

Cube R-CNN demonstrates impressive performance across Omni3D and standard benchmarks such as KITTI and SUN RGB-D. The model's ability to excel with one unified learning approach, over-specialized methods designed for specific domains, is noteworthy. On KITTI, Cube R-CNN achieves competitive results despite not being tailored to the benchmark, and it outperforms state-of-the-art methods when extended with virtual camera capabilities.

The introduction of a fast algorithm for computing 3D IoU, which is 450 times faster than existing methods, is a technical feat that facilitates efficient evaluation on large-scale datasets like Omni3D.

Implications and Future Directions

The implications of this research are multifaceted. Practically, Omni3D and Cube R-CNN provide a robust platform for advancing autonomous systems that rely on accurate 3D perception, such as robotics and AR/VR applications. Theoretically, the research establishes a solid foundation for exploring general-purpose 3D object detection models that can transcend domain-specific constraints.

Future developments could delve into enhancing the model's capability to self-calibrate camera intrinsics, expanding its robustness and applicability in real-world scenarios. Additionally, Omni3D's utility for few-shot learning scenarios suggests promising avenues for further reducing annotation costs in new environments.

In conclusion, the paper's contributions significantly advance the state of 3D object detection, setting a new standard for benchmark datasets and model performance in diverse and unconstrained settings.

PDF Markdown

Related Papers

GitHub

GitHub - facebookresearch/omni3d: Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild" (788 stars)

Tweets

https://twitter.com/georgiagkioxari/status/1764010066188161187