Objaverse: A Universe of Annotated 3D Objects (2212.08051v1)

Published 15 Dec 2022 in cs.CV, cs.AI, cs.GR, and cs.RO

Abstract: Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and LAION have propelled recent dramatic progress in AI. Large neural models trained on such datasets produce impressive results and top many of today's benchmarks. A notable omission within this family of large-scale datasets is 3D data. Despite considerable interest and potential applications in 3D vision, datasets of high-fidelity 3D models continue to be mid-sized with limited diversity of object categories. Addressing this gap, we present Objaverse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. Objaverse improves upon present day 3D repositories in terms of scale, number of categories, and in the visual diversity of instances within a category. We demonstrate the large potential of Objaverse via four diverse applications: training generative 3D models, improving tail category segmentation on the LVIS benchmark, training open-vocabulary object-navigation models for Embodied AI, and creating a new benchmark for robustness analysis of vision models. Objaverse can open new directions for research and enable new applications across the field of AI.

Authors (10)

Matt Deitke (11 papers)
Dustin Schwenk (15 papers)
Jordi Salvador (15 papers)
Luca Weihs (46 papers)
Oscar Michel (8 papers)
Eli VanderBilt (10 papers)
Ludwig Schmidt (80 papers)
Kiana Ehsani (31 papers)
Aniruddha Kembhavi (79 papers)
Ali Farhadi (138 papers)

Citations (658)

View on Semantic Scholar

Summary

Objaverse: A Universe of Annotated 3D Objects

The paper introduces Objaverse 1.0, a substantial contribution to the field of AI through the provision of a large-scale dataset comprising over 800,000 annotated 3D models. Addressing the notable scarcity in high-fidelity 3D data, Objaverse aims to fill this gap in the AI data landscape by delivering a dataset that excels in diversity and scale, sourced primarily from Sketchfab. This paper discusses various applications of the dataset, emphasizing its potential impact on research and development within diverse AI domains.

Key Contributions and Results

Objaverse significantly surpasses existing 3D datasets, such as ShapeNet and ABO, both in scale and the diversity of object categories, with objects annotated with descriptive captions and tags. The dataset not only includes typical items like animals and vehicles but also comprises more extensive spaces suitable for embodied AI research. Moreover, it contains a wide array of both animated and rigged characters, which could advance the fields of temporal 3D learning and animation generation.

The authors demonstrate Objaverse's versatility through four applications:

3D Generative Modeling: Utilizing Objaverse, models like GET3D have been trained across different categories such as shoes and bags, resulting in more diverse and high-quality 3D objects compared to those trained on ShapeNet data. In diversity experiments, the Objaverse-trained models were perceived by human annotators as more diverse than those trained on alternative datasets 91% of the time.
Instance Segmentation: By employing a novel augmentation technique named 3DCP (3D Copy-Paste), Objaverse facilitates improvements in instance segmentation tasks on the LVIS dataset. This approach leverages 3D objects rendered into 2D images for additional training data, yielding performance gains across various metrics, particularly in long-tail instance segmentation.
Embodied AI with Open-Vocabulary Object Navigation: Objaverse allows for procedurally generated environments in ProcTHOR to be populated with its diverse asset library, increasing the number of object targets significantly. Agents are trained to recognize 1,100 semantic categories, enhancing the complexity and realism of embodied AI navigation tasks.
Robustness in Computer Vision: The dataset serves as the backbone for devising a benchmark testing state-of-the-art models' robustness to perspective shifts. Random orientations from Objaverse models illustrate a substantial decrease in performance for existing vision models, highlighting potential areas for improvement in future robustness training.

Implications and Future Directions

Objaverse's introduction marks a pivotal step in scaling and diversifying available 3D objects for AI research, promising substantial advancements in generative modeling, simulation-based training, and robustness analysis. The dataset encourages the exploration of new areas such as large-vocabulary 3D generation and robust perception models suited to a wide range of viewing conditions.

While initial experiments show promising outcomes, there remains vast potential for further research exploiting Objaverse. Future work might focus on improving real-world applicability by integrating these models into more comprehensive AI systems or exploring the nuances of combining Objaverse data with other multimodal datasets to enhance AI model training and evaluation further.

In conclusion, Objaverse's comprehensive annotated 3D object collection provides a rich resource for researchers aiming to advance both theoretical understanding and practical applications of AI within the field of 3D data processing and beyond.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos