RealFusion: 360° Reconstruction of Any Object from a Single Image (2302.10663v2)

Published 21 Feb 2023 in cs.CV, cs.AI, and cs.LG

Abstract: We consider the problem of reconstructing a full 360{\deg} photographic model of an object from a single image of it. We do so by fitting a neural radiance field to the image, but find this problem to be severely ill-posed. We thus take an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to "dream up" novel views of the object. Using an approach inspired by DreamFields and DreamFusion, we fuse the given input view, the conditional prior, and other regularizers in a final, consistent reconstruction. We demonstrate state-of-the-art reconstruction results on benchmark images when compared to prior methods for monocular 3D reconstruction of objects. Qualitatively, our reconstructions provide a faithful match of the input view and a plausible extrapolation of its appearance and 3D shape, including to the side of the object not visible in the image.

Authors (4)

Luke Melas-Kyriazi (22 papers)
Christian Rupprecht (90 papers)
Iro Laina (41 papers)
Andrea Vedaldi (195 papers)

Citations (259)

View on Semantic Scholar

Summary

The paper introduces RealFusion, a method that integrates neural radiance fields with diffusion models to reconstruct detailed 3D objects from a single image.
It pioneers a single-image textual inversion technique to synthesize consistent multi-view prompts, enhancing the overall reconstruction precision.
A coarse-to-fine optimization strategy combined with surface normal smoothing regularization significantly boosts both computational efficiency and visual fidelity in empirical evaluations.

An Overview of "RealFusion: 360° Reconstruction of Any Object from a Single Image"

The paper "RealFusion: 360° Reconstruction of Any Object from a Single Image," authored by Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, and Andrea Vedaldi, explores the challenging task of reconstructing a comprehensive 3D model of an object from a single 2D image using generative models, particularly diffusion-based image generators. This research presents a novel technique, termed RealFusion, which leverages neural radiance fields and diffusion models to generate plausible 3D reconstructions, extending beyond the limitations of prior methods primarily focused on synthetic objects or specific categories.

Key Contributions

Neural Radiance Fields and Diffusion Models Integration: The paper departs from conventional monocular reconstruction techniques by incorporating a diffusion model as a prior. Specifically, they use a diffusion process to synthesize additional views, compensating for the inherent ambiguity of inferring 3D geometry from a single perspective.
Single-image Textual Inversion: A unique aspect of RealFusion is the adaptation of a single-image textual inversion approach. This method enriches the diffusion model's guidance by creating a tailored prompt derived from the input image. This allows for synthesizing views that maintain consistency with the original single image, significantly improving the quality of the 3D reconstruction.
Coarse-to-Fine Optimization: RealFusion employs a multi-resolution feature grid, similar to InstantNGP, facilitating efficient training through a coarse-to-fine strategy. This approach ensures that the overall structure is solidified before refining finer details, enhancing both computational efficiency and reconstruction accuracy.
Surface Normal Smoothing Regularization: To address artifacts in geometry and ensure visually smooth surfaces, a regularization term is introduced that enforces smoothness of surface normals in 2D, contributing to improved visual fidelity of the reconstructed models.

Empirical Evaluation and Results

The RealFusion approach sets a new state-of-the-art for single-image 3D reconstruction across a variety of objects, evaluated on benchmarks that include both in-the-wild images and established datasets. Quantitative assessments demonstrate superior performance in both geometric accuracy and appearance fidelity compared to previous methods, such as Shelf-Supervised Mesh Prediction, especially in categories without specific model training.

Importantly, the paper emphasizes the method's capability to generate different plausible reconstructions by restarting the process with different random seeds, acknowledging the potential variance due to the ill-posed nature of single-image reconstruction.

Implications and Future Directions

RealFusion makes strides in the integration of neural rendering techniques and pre-trained diffusion models, opening avenues for more generalizable and versatile 3D reconstruction frameworks. While the method currently requires a fairly complex setup involving both neural radiance fields and diffusion models, future work could explore further optimization of these models or even real-time applications in dynamic scene reconstruction.

The notion of using single-image inversion for conditioning diffusion models could inspire new methodologies in other domains, such as scene understanding or inverse graphics. Additionally, further research could address current limitations, such as handling dynamic scenes or improving reconstruction consistency across diverse object types and environments.

Overall, this paper advances the field of 3D reconstruction by providing an innovative solution that effectively utilizes existing powerful 2D generative models, offering a pathway toward more practical and scalable 3D understanding technologies.

PDF Markdown

Related Papers

YouTube

Show All Videos