MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation (2404.03656v1)

Published 4 Apr 2024 in cs.CV

Abstract: We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images. While recent methods pursuing 3D inference advocate learning novel-view generative models, these generations are not 3D-consistent and require a distillation process to generate a 3D output. We instead cast the task of 3D inference as directly generating mutually-consistent multiple views and build on the insight that additionally inferring depth can provide a mechanism for enforcing this consistency. Specifically, we train a denoising diffusion model to generate multi-view RGB-D images given a single RGB input image and leverage the (intermediate noisy) depth estimates to obtain reprojection-based conditioning to maintain multi-view consistency. We train our model using large-scale synthetic dataset Obajverse as well as the real-world CO3D dataset comprising of generic camera viewpoints. We demonstrate that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods. We also evaluate the geometry induced by our multi-view depth prediction and find that it yields a more accurate representation than other direct 3D inference approaches.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (54)

Authors (4)

Hanzhe Hu (7 papers)
Zhizhuo Zhou (3 papers)
Varun Jampani (125 papers)
Shubham Tulsiani (71 papers)

Citations (10)

View on Semantic Scholar

Tweets

https://twitter.com/realmofresearch/status/1777521681428627954

MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation (2404.03656v1)

Related Papers

Tweets