Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UnCommon Objects in 3D (2501.07574v1)

Published 13 Jan 2025 in cs.CV, cs.AI, and cs.GR

Abstract: We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available collection of high-resolution videos of objects with 3D annotations that ensures full-360${\circ}$ coverage. uCO3D is significantly more diverse than MVImgNet and CO3Dv2, covering more than 1,000 object categories. It is also of higher quality, due to extensive quality checks of both the collected videos and the 3D annotations. Similar to analogous datasets, uCO3D contains annotations for 3D camera poses, depth maps and sparse point clouds. In addition, each object is equipped with a caption and a 3D Gaussian Splat reconstruction. We train several large 3D models on MVImgNet, CO3Dv2, and uCO3D and obtain superior results using the latter, showing that uCO3D is better for learning applications.

Summary

  • The paper introduces uCO3D, the largest dataset featuring high-res 360° videos and extensive 3D annotations across over 1,000 object categories.
  • It employs VGGSfM for precise structure-from-motion and uses active learning for rigorous quality validation of scenes.
  • Empirical results show that models like LRM and CAT3D trained on uCO3D outperform those using older datasets, setting a new benchmark in 3D AI.

Overview of "UnCommon Objects in 3D"

The paper presents UnCommon Objects in 3D (uCO3D), a comprehensive dataset designed to drive advances in 3D deep learning and generative AI. uCO3D stands out in its diversity and quality, offering extensive annotations across more than 1,000 object categories. This positions it as a superior resource compared to existing datasets like MVImgNet and CO3Dv2, which it surpasses in both scale and detail.

Core Contributions

The authors introduce uCO3D as the largest publicly available collection of high-resolution 360-degree videos with 3D annotations. The dataset addresses the inherent challenges in acquiring 3D data by focusing on quality and diversity. To ensure meticulous data capture, the creators implemented stringent quality checks on both the videos and their 3D annotations. This brings a substantial improvement in terms of object categories and video resolution, with more than 60% of videos being 1080p or higher.

Qualitative and quantitative improvements are key strengths of uCO3D. Videos are captured following a strategically planned sine-wave path, enhancing viewpoint coverage relative to predecessors. The dataset is organized into super-categories, assisting in managing its extensive 1,070 visual categories. In addition to standard 3D annotations, each object is paired with a descriptive caption and a robust 3D Gaussian Splat reconstruction.

Methodological Enhancements and Validation

uCO3D incorporates VGGSfM for structure-from-motion, offering more precise results compared to previous methods like COLMAP. This decision highlights the focus on ensuring accurate reconstructions and reliable 3D camera positioning. Furthermore, the active-learning-based validation approach ensures that only high-quality scenes are retained.

Empirical validation underscores uCO3D's advantages. Models trained on this dataset, including LRM and CAT3D, demonstrate superior performance compared to those trained on MVImgNet and CO3Dv2. This underscores uCO3D's efficacy in enhancing the learning capacity and application performance of 3D models.

Implications and Future Directions

The introduction of uCO3D represents a significant leap forward in the field of real object-centric datasets, particularly in achieving a more balanced coverage of visual categories. The dataset’s remarkable diversity and quality cater to various applications, from feedforward 3D reconstruction to generative tasks such as photorealistic text-to-3D. Looking ahead, the comprehensive nature of uCO3D suggests its potential influence in building more nuanced and powerful 3D models, possibly expanding towards interactive AI systems capable of understanding complex environments.

uCO3D sets a new standard for 3D datasets and provides a fertile ground for further explorations in deep learning and computer vision. Future work might involve expanding the dataset's applications to real-time processing and advancements in the theoretical frameworks that support these applied innovations.

In conclusion, uCO3D fills a significant gap in the 3D dataset landscape by offering an unparalleled combination of scale and precision. It is set to become a critical resource for researchers and practitioners seeking to further explore the capabilities of 3D AI systems.