LRM-Zero: Training Large Reconstruction Models with Synthesized Data (2406.09371v2)

Published 13 Jun 2024 in cs.CV and cs.LG

Abstract: We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e.g., Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/.

Authors (10)

Desai Xie (4 papers)
Sai Bi (44 papers)
Zhixin Shu (37 papers)
Kai Zhang (542 papers)
Zexiang Xu (56 papers)
Yi Zhou (438 papers)
Sören Pirk (25 papers)
Arie Kaufman (23 papers)
Xin Sun (151 papers)
Hao Tan (80 papers)

Citations (10)

View on Semantic Scholar

Summary

The paper demonstrates that training 3D reconstruction models with synthesized data can yield performance nearly matching that of models trained on real-world datasets.
It introduces the Zeroverse dataset, using procedural generation of geometric shapes to provide diverse, augmentation-rich data for sparse-view reconstruction.
Key results reveal competitive PSNR and SSIM scores, indicating that geometry-focused training is a viable alternative to semantically rich data in 3D modeling.

Synopsis of "LRM-Zero: Training Large Reconstruction Models with Synthesized Data"

The paper "LRM-Zero: Training Large Reconstruction Models with Synthesized Data" presents a novel approach for training large reconstruction models (LRMs) using entirely synthesized data, which challenges the conventional reliance on realistic datasets. The proposed framework, LRM-Zero, is underpinned by a procedurally generated dataset called Zeroverse, which consists of complex geometric shapes devoid of realistic semantics. This paper explores how such data can be effectively used to train models for sparse-view 3D reconstruction tasks, achieving performance that rivals models trained on human-crafted datasets like Objaverse.

Methodology

The centerpiece of this work is the Zeroverse dataset. Unlike traditional 3D datasets that focus on capturing realistic objects, Zeroverse is constructed from simple primitive shapes with various augmentations. It uses a procedural generation method where shapes are composed using random geometric transformations, texturing, and augmentations such as height fields, boolean differences, and wireframes. Each augmentation technique adds distinct geometric properties, like concavity or thin structures, enhancing the dataset's diversity.

The synthesized Zeroverse data is rendered into multi-view images which serve as training data for the LRM-Zero model. This paper uses a feed-forward architecture inspired by recent Transformer-based models to predict 3D reconstructions from these multiple views.

Results and Analysis

The authors compare LRM-Zero against GS-LRM, a model trained on Objaverse, using standard benchmarks such as Google Scanned Objects (GSO) and Amazon Berkeley Objects (ABO). LRM-Zero achieves comparable performance, with a deviation in PSNR of only 1.12 and 0.09 SSIM below GS-LRM on GSO for 8-view reconstructions. Qualitatively, LRM-Zero delivers similar visual fidelity on various testing datasets, indicating effective generalization despite the absence of high-level semantic information during training.

A key finding is the significant impact of localized geometric features in sparse-view reconstruction, as evidenced by the model's competitive performance despite being trained without global semantic information.

Implications

This research has theoretical implications by highlighting the potential for training 3D reconstruction models with non-semantic data. Practically, it suggests pathways for mitigating data scarcity and privacy issues inherent in collecting and using real-world 3D data. The absence of realistic semantics in the training data highlights a focus on geometry and local visual features, suggesting avenues for models specializing in purely geometric reconstruction tasks.

The analysis shows that model performance is sensitive to the design of synthesized data. Augmentations like boolean differences and wireframes are crucial for capturing intricate object details. The paper also identifies training stability challenges, suggesting a nuanced interaction between model architecture, training parameters, and data complexity.

Future Perspectives

This work opens multiple future research avenues. Exploring more advanced procedural synthesis techniques could further diversify Zeroverse, potentially reducing the gap with semantic-rich datasets. Furthermore, optimizing model architectures specifically for synthesized data could enhance scalability and performance. Addressing the limitations of semantically poor datasets, future research could investigate hybrid datasets combining synthesized and real data.

Moreover, the application of synthesized data to other domains of 3D vision, such as object recognition and scene understanding, represents a promising direction. The disentanglement of global semantics from geometric modeling could facilitate the specialization of models for distinct 3D tasks, allowing a divide-and-conquer approach to 3D vision challenges.

In summary, "LRM-Zero: Training Large Reconstruction Models with Synthesized Data" makes a compelling case for using synthesized data in 3D reconstruction, challenging traditional reliance on realism and opening new frontiers for exploration in model training methodologies. Its contributions lie not only in demonstrating a proof-of-concept but also in paving the way for novel data-efficient and privacy-preserving approaches in computational 3D modeling.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/DesaiXie/status/1801515135053533631

https://twitter.com/HaoTan5/status/1801654092919619966