Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReconFusion: 3D Reconstruction with Diffusion Priors (2312.02981v1)

Published 5 Dec 2023 in cs.CV

Abstract: 3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process. We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for novel view synthesis, trained on synthetic and multiview datasets, which regularizes a NeRF-based 3D reconstruction pipeline at novel camera poses beyond those captured by the set of input images. Our method synthesizes realistic geometry and texture in underconstrained regions while preserving the appearance of observed regions. We perform an extensive evaluation across various real-world datasets, including forward-facing and 360-degree scenes, demonstrating significant performance improvements over previous few-view NeRF reconstruction approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. CVPR, 2022.
  2. Zip-NeRF: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
  3. InstructPix2Pix: Learning to Follow Image Editing Instructions. CVPR, 2023.
  4. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. CVPR, 2021.
  5. Efficient geometry-aware 3D generative adversarial networks. CVPR, 2022.
  6. GeNVS: Generative novel view synthesis with 3D-aware diffusion models. arXiv, 2023.
  7. ShapeNet: An Information-Rich 3D Model Repository. arXiv, 2015.
  8. Two deterministic half-quadratic regularization algorithms for computed imaging. ICIP, 1994.
  9. MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo. ICCV, 2021.
  10. Objaverse: A Universe of Annotated 3D Objects. CVPR, 2023.
  11. Depth-supervised NeRF: Fewer Views and Faster Training for Free. CVPR, 2022.
  12. Deepstereo: Learning to predict new views from the world’s imagery. CVPR, 2016.
  13. 3d shape induction from 2d views of multiple objects. 3DV, 2017.
  14. Get3d: A generative model of high quality 3d textured shapes learned from images. NeurIPS, 2022.
  15. Generative adversarial nets. NIPS, 2014.
  16. NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion. ICML, 2023.
  17. SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis. ICCV, 2023.
  18. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions. ICCV, 2023.
  19. Escaping plato’s cave: 3d shape from adversarial rendering. ICCV, 2019.
  20. Unsupervised Learning of 3D Object Categories From Videos in the Wild. ICCV, 2021.
  21. Denoising Diffusion Probabilistic Models. NeurIPS, 2020.
  22. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis. ICCV, 2021.
  23. Large Scale Multi-view Stereopsis Evaluation. CVPR, 2014.
  24. Holofusion: Towards photo-realistic 3d generative modeling. ICCV, 2023a.
  25. Holodiffusion: Training a 3d diffusion model using 2d images. CVPR, 2023b.
  26. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. SIGGRAPH, 2023.
  27. GeCoNeRF: Few-shot Neural Radiance Fields via Geometric Consistency. ICML, 2023.
  28. Magic3D: High-Resolution Text-to-3D Content Creation. CVPR, 2023.
  29. Zero-1-to-3: Zero-Shot One Image to 3D Object. arXiv, 2023a.
  30. SyncDreamer: Generating Multiview-consistent Images from a Single-view Image. arXiv, 2023b.
  31. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines. SIGGRAPH, 2019.
  32. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV, 2020.
  33. Hologan: Unsupervised learning of 3d representations from natural images. ICCV, 2019.
  34. Giraffe: Representing scenes as compositional generative neural feature fields. CVPR, 2021.
  35. RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs. CVPR, 2022.
  36. DreamFusion: Text-to-3D using 2D Diffusion. ICLR, 2022.
  37. Learning Transferable Visual Models From Natural Language Supervision. ICML, 2021.
  38. Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction. ICCV, 2021.
  39. Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image. CVPR, 2022.
  40. Dense Depth Priors for Neural Radiance Fields from Sparse Input Views. CVPR, 2022.
  41. GANeRF: Leveraging Discriminators to Optimize Neural Radiance Fields. arXiv preprint arXiv:2306.06044, 2023.
  42. High-Resolution Image Synthesis with Latent Diffusion Models. CVPR, 2022.
  43. Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. CVPR, 2022.
  44. ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image. arXiv:2310.17994, 2023.
  45. Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. NeurIPS, 2022.
  46. MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs. CVPR, 2023.
  47. MVDream: Multi-view Diffusion for 3D Generation. arXiv, 2023.
  48. ViP-NeRF: Visibility Prior for Sparse Input Neural Radiance Fields. SIGGRAPH, 2023.
  49. SimpleNeRF: Regularizing Sparse Input Neural Radiance Fields with Simpler Solutions. SIGGRAPH Asia, 2023.
  50. Denoising Diffusion Implicit Models. ICLR, 2020.
  51. Generalizable patch-based neural rendering. ECCV, 2022.
  52. Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision. arXiv, 2023.
  53. GRF: Learning a General Radiance Field for 3D Representation and Rendering. ICCV, 2021.
  54. Consistent View Synthesis with Pose-Guided Diffusion Models. CVPR, 2023.
  55. Single-View View Synthesis With Multiplane Images. CVPR, 2020.
  56. Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation. CVPR, 2023.
  57. IBRNet: Learning Multi-View Image-Based Rendering. CVPR, 2021.
  58. Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs. ICCV, 2023.
  59. Novel View Synthesis with Diffusion Models. ICLR, 2022.
  60. DiffusioNeRF: Regularizing neural radiance fields with denoising diffusion models. CVPR, 2023.
  61. FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization. CVPR, 2023.
  62. DreamSparse: Escaping from Plato’s Cave with 2D Diffusion Model Given Sparse Views. arXiv, 2023.
  63. pixelNeRF: Neural Radiance Fields from One or Few Images. CVPR, 2021.
  64. MVImgNet: A Large-scale Dataset of Multi-view Images. CVPR, 2023.
  65. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. CVPR, 2018.
  66. Stereo Magnification: Learning View Synthesis using Multiplane Images. SIGGRAPH, 2018.
  67. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. CVPR, 2023.
  68. Visual object networks: Image generation with disentangled 3d representations. NeurIPS, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Rundi Wu (15 papers)
  2. Ben Mildenhall (41 papers)
  3. Philipp Henzler (18 papers)
  4. Keunhong Park (8 papers)
  5. Ruiqi Gao (44 papers)
  6. Daniel Watson (8 papers)
  7. Pratul P. Srinivasan (38 papers)
  8. Dor Verbin (21 papers)
  9. Jonathan T. Barron (89 papers)
  10. Ben Poole (46 papers)
  11. Aleksander Holynski (37 papers)
Citations (107)

Summary

In the field of computer vision, creating 3D models from a collection of 2D images is a complex task that often requires a large number of images to achieve photo-realistic results. This is particularly true for Neural Radiance Fields (NeRF), a technique that excels at rendering highly realistic novel views of complex scenes. Unfortunately, capturing such a large number of images to cover every angle of a scene can be impractical and time-consuming.

A novel approach, termed ReconFusion, addresses this challenge by enabling the reconstruction of real-world scenes using as few as just a handful of photos. The key innovation lies in leveraging a diffusion model, a type of generative model known for producing high-quality images, to guide the reconstruction process. The diffusion model, trained on synthetic and multi-view datasets, functions as an image prior. This means it can estimate what unseen parts of the scene might look like, given a few observed views, and use this information to regularize the 3D reconstruction pipeline.

ReconFusion's process synthesizes realistic geometry and textures in regions of the scene that are underconstrained (i.e., have been observed from too few angles), while preserving the fidelity of the parts that have been captured from multiple perspectives. This technique has been rigorously tested on diverse datasets, including those that provide forward-facing or 360-degree views. It significantly outperforms existing NeRF-based methods for scenarios with minimal views.

Interestingly, ReconFusion not only helps when the number of available views is exceedingly low but can also enhance quality and reduce common artifacts known as "floaters" in scenarios where there are a significant number of observations. It serves as a drop-in regularizer for NeRF, applicable to a variety of capture situations, helping to make 3D model reconstruction more accessible and less reliant on dense image captures.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com