ReconFusion: 3D Reconstruction with Diffusion Priors (2312.02981v1)

Published 5 Dec 2023 in cs.CV

Abstract: 3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process. We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for novel view synthesis, trained on synthetic and multiview datasets, which regularizes a NeRF-based 3D reconstruction pipeline at novel camera poses beyond those captured by the set of input images. Our method synthesizes realistic geometry and texture in underconstrained regions while preserving the appearance of observed regions. We perform an extensive evaluation across various real-world datasets, including forward-facing and 360-degree scenes, demonstrating significant performance improvements over previous few-view NeRF reconstruction approaches.

References (68)

Authors (11)

Rundi Wu (15 papers)
Ben Mildenhall (41 papers)
Philipp Henzler (18 papers)
Keunhong Park (8 papers)
Ruiqi Gao (44 papers)
Daniel Watson (8 papers)
Pratul P. Srinivasan (38 papers)
Dor Verbin (21 papers)
Jonathan T. Barron (89 papers)
Ben Poole (46 papers)
Aleksander Holynski (37 papers)

Citations (107)

View on Semantic Scholar

Summary

In the field of computer vision, creating 3D models from a collection of 2D images is a complex task that often requires a large number of images to achieve photo-realistic results. This is particularly true for Neural Radiance Fields (NeRF), a technique that excels at rendering highly realistic novel views of complex scenes. Unfortunately, capturing such a large number of images to cover every angle of a scene can be impractical and time-consuming.

A novel approach, termed ReconFusion, addresses this challenge by enabling the reconstruction of real-world scenes using as few as just a handful of photos. The key innovation lies in leveraging a diffusion model, a type of generative model known for producing high-quality images, to guide the reconstruction process. The diffusion model, trained on synthetic and multi-view datasets, functions as an image prior. This means it can estimate what unseen parts of the scene might look like, given a few observed views, and use this information to regularize the 3D reconstruction pipeline.

ReconFusion's process synthesizes realistic geometry and textures in regions of the scene that are underconstrained (i.e., have been observed from too few angles), while preserving the fidelity of the parts that have been captured from multiple perspectives. This technique has been rigorously tested on diverse datasets, including those that provide forward-facing or 360-degree views. It significantly outperforms existing NeRF-based methods for scenarios with minimal views.

Interestingly, ReconFusion not only helps when the number of available views is exceedingly low but can also enhance quality and reduce common artifacts known as "floaters" in scenarios where there are a significant number of observations. It serves as a drop-in regularizer for NeRF, applicable to a variety of capture situations, helping to make 3D model reconstruction more accessible and less reliant on dense image captures.

PDF Markdown

Tweets

https://twitter.com/1565330182176911367/status/1732291920414859300

YouTube

Show All Videos

ReconFusion: 3D Reconstruction with Diffusion Priors (2312.02981v1)

Summary

Related Papers

Tweets

YouTube