Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Denoising Diffusion via Image-Based Rendering (2402.03445v2)

Published 5 Feb 2024 in cs.CV, cs.GR, and cs.LG

Abstract: Generating 3D scenes is a challenging open problem, which requires synthesizing plausible content that is fully consistent in 3D space. While recent methods such as neural radiance fields excel at view synthesis and 3D reconstruction, they cannot synthesize plausible details in unobserved regions since they lack a generative capability. Conversely, existing generative methods are typically not capable of reconstructing detailed, large-scale scenes in the wild, as they use limited-capacity 3D scene representations, require aligned camera poses, or rely on additional regularizers. In this work, we introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes. To achieve this, we make three contributions. First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes, dynamically allocating more capacity as needed to capture details visible in each image. Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images without the need for any additional supervision signal such as masks or depths. This supports 3D reconstruction and generation in a unified architecture. Third, we develop a principled approach to avoid trivial 3D solutions when integrating the image-based rendering with the diffusion model, by dropping out representations of some images. We evaluate the model on several challenging datasets of real and synthetic images, and demonstrate superior results on generation, novel view synthesis and 3D reconstruction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (112)
  1. Unsupervised causal generative understanding of images. Advances in Neural Information Processing Systems, 35:37037–37054, 2022.
  2. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  12608–12618, June 2023.
  3. Wasserstein generative adversarial networks. In International conference on machine learning, pp. 214–223. PMLR, 2017.
  4. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5855–5864, 2021.
  5. Gaudi: A neural architect for immersive 3d scene generation. arXiv preprint arXiv:2207.13751, 2022.
  6. Align your latents: High-resolution video synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  7. Diffdreamer: Consistent single-view perpetual view generation with conditional diffusion models. arXiv preprint arXiv:2211.12131, 2022.
  8. Efficient geometry-aware 3D generative adversarial networks. In CVPR, 2022.
  9. Generative novel view synthesis with 3d-aware diffusion models, 2023.
  10. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  11. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  14124–14133, 2021.
  12. Tensorf: Tensorial radiance fields. arXiv preprint arXiv:2203.09517, 2022.
  13. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. arXiv preprint arXiv:2304.06714, 2023.
  14. SDFusion: Multimodal 3d shape completion, reconstruction, and generation. In CVPR, 2023.
  15. Gram: Generative radiance manifolds for 3d-aware image generation. In IEEE Computer Vision and Pattern Recognition, 2022.
  16. Stochastic video generation with a learned prior. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  1174–1183. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/denton18a.html.
  17. Unconstrained scene generation with locally conditioned radiance fields. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp.  14284–14293, 2021.
  18. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  19. Learning to render novel views from wide-baseline stereo pairs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  20. Neural scene representation and rendering. Science, 360(6394):1204–1210, 2018.
  21. Scenescape: Text-driven consistent scene generation, 2023.
  22. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12479–12488, 2023a.
  23. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023b.
  24. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  25. Learning controllable 3d diffusion models from single-view images, 2023.
  26. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371, 2023.
  27. Unsupervised object-centric video generation and decomposition in 3D. In Advances in Neural Information Processing Systems (NeurIPS) 33, 2020.
  28. Leveraging 2D data to learn textured 3D mesh generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  29. Unsupervised learning of 3d object categories from videos in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4700–4709, 2021.
  30. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30, 2017.
  31. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  32. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res., 23:47–1, 2022.
  33. Text2room: Extracting textured 3d meshes from 2d text-to-image models, 2023.
  34. Noise2music: Text-conditioned music generation with diffusion models, 2023.
  35. Neural wavelet-domain diffusion for 3d shape generation. In SIGGRAPH Asia 2022 Conference Papers, pp.  1–9, 2022.
  36. Joint 3d scene reconstruction and class segmentation. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.  97–104, 2013. doi: 10.1109/CVPR.2013.20.
  37. Scalable adaptive computation for iterative generation. arXiv preprint arXiv:2212.11972, 2022.
  38. Zero-shot text-guided object generation with dream fields. 2022.
  39. Holodiffusion: Training a 3d diffusion model using 2d images, 2023.
  40. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4401–4410, 2019.
  41. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proceedings of the 37th International Conference on Machine Learning, ICML’20, 2020.
  42. Neuralfield-ldm: Scene generation with hierarchical latent diffusion models, 2023.
  43. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  44. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  45. Auto-encoding variational bayes. In Yoshua Bengio and Yann LeCun (eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. URL http://arxiv.org/abs/1312.6114.
  46. Nerf-vae: A geometry aware 3d scene generative model. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp.  5742–5752. PMLR, 2021. URL http://proceedings.mlr.press/v139/kosiorek21a.html.
  47. Viewformer: Nerf-free neural rendering from few images using transformers. In European Conference on Computer Vision (ECCV), 2022.
  48. Image-based reconstruction of spatial appearance and geometric detail. ACM Trans. Graph., 22(2):234–257, apr 2003. ISSN 0730-0301. doi: 10.1145/636886.636891. URL https://doi.org/10.1145/636886.636891.
  49. Diffusion-sdf: Text-to-shape via voxelized diffusion, 2022.
  50. Neuralangelo: High-fidelity neural surface reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  51. Magic3d: High-resolution text-to-3d content creation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  52. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, 2023a.
  53. Zero-1-to-3: Zero-shot one image to 3d object, 2023b.
  54. Neural rays for occlusion-aware image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7824–7833, 2022.
  55. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11461–11471, 2022.
  56. Diffusion probabilistic models for 3d point cloud generation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021. doi: 10.1109/cvpr46437.2021.00286. URL http://dx.doi.org/10.1109/CVPR46437.2021.00286.
  57. Nelson Max. Optical models for direct volume rendering. IEEE Trans. on Visualization and Computer Graphics, 1995.
  58. Realfusion: 360° reconstruction of any object from a single image. In Arxiv, 2023.
  59. Latent-nerf for shape-guided generation of 3d shapes and textures. arXiv preprint arXiv:2211.07600, 2022.
  60. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  61. Diffrf: Rendering-guided 3d radiance field diffusion. arxiv, 2022.
  62. Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989, 2022.
  63. Hologan: Unsupervised learning of 3d representations from natural images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  7588–7597, 2019.
  64. Blockgan: Learning 3d object-aware scene representations from unlabelled images. arXiv preprint arXiv:2002.08988, 2020.
  65. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2022. doi: 10.1109/cvpr52688.2022.00540. URL http://dx.doi.org/10.1109/CVPR52688.2022.00540.
  66. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  165–174, 2019.
  67. Convolutional occupancy networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp.  523–540. Springer, 2020.
  68. Dreamfusion: Text-to-3d using 2d diffusion. arXiv, 2022.
  69. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In International Conference on Computer Vision, 2021.
  70. Look outside the room: Synthesizing a consistent long-term 3d scene video from a single image, 2022.
  71. Pixelsynth: Generating a 3d-consistent experience from a single image. arXiv preprint arXiv:2108.05892, 2021.
  72. Geometry-free view synthesis: Transformers and no 3d priors, 2021.
  73. U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015.
  74. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  75. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  76. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016.
  77. GRAF: generative radiance fields for 3d-aware image synthesis. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/e92e1b476bb5262d793fd40931e0ed53-Abstract.html.
  78. A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pp.  519–528, 2006a. doi: 10.1109/CVPR.2006.19.
  79. A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), volume 1, pp.  519–528. IEEE, 2006b.
  80. Mvdream: Multi-view diffusion for 3d generation, 2023.
  81. 3d-aware indoor scene synthesis with depth priors. 2022.
  82. 3d neural field generation using triplane diffusion. arXiv preprint arXiv:2211.16677, 2022.
  83. 3d generation on imagenet, 2023.
  84. Deep unsupervised learning using nonequilibrium thermodynamics. CoRR, abs/1503.03585, 2015. URL http://arxiv.org/abs/1503.03585.
  85. Denoising diffusion implicit models. arXiv:2010.02502, October 2020. URL https://arxiv.org/abs/2010.02502.
  86. Viewset diffusion: (0-)image-conditioned 3d generative models from 2d data. arXiv, 2023.
  87. Mvdiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. arXiv, 2023.
  88. Diffusion with forward models: Solving stochastic inverse problems without direct supervision. In arXiv, 2023.
  89. Consistent view synthesis with pose-guided diffusion models, 2023.
  90. Lion: Latent point diffusion models for 3d shape generation. Advances in Neural Information Processing Systems, 35:10021–10039, 2022.
  91. Pixel recurrent neural networks. In International conference on machine learning, pp. 1747–1756. PMLR, 2016.
  92. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  93. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  94. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation, 2022a.
  95. Ibrnet: Learning multi-view image-based rendering. In CVPR, 2021.
  96. Rodin: A generative model for sculpting 3d digital avatars using diffusion, 2022b.
  97. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023.
  98. Novel view synthesis with diffusion models. arXiv preprint arXiv:2210.04628, 2022.
  99. Synsin: End-to-end view synthesis from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7467–7477, 2020.
  100. Multiview compressive coding for 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9065–9075, 2023.
  101. Diffusionerf: Regularizing neural radiance fields with denoising diffusion models. In arxiv, 2023.
  102. 3d-aware image generation using 2d diffusion models, 2023.
  103. Neural fields in visual computing and beyond. Computer Graphics Forum, 2022. ISSN 1467-8659. doi: 10.1111/cgf.14505.
  104. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5438–5448, 2022.
  105. Dreamsparse: Escaping from plato’s cave with 2d frozen diffusion model given sparse views. CoRR, 2023.
  106. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4578–4587, 2021.
  107. Long-term photometric consistent novel view synthesis with diffusion models, 2023a.
  108. Mvimgnet: A large-scale dataset of multi-view images. In CVPR, 2023b.
  109. Generative multiplane images: Making a 2d gan 3d-aware. In Proc. ECCV, 2022.
  110. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5826–5835, 2021.
  111. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
  112. Sparse3d: Distilling multiview-consistent diffusion for object reconstruction from sparse views. arXiv preprint arXiv:2308.14078, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Fabian Manhardt (41 papers)
  2. Federico Tombari (214 papers)
  3. Paul Henderson (37 papers)
  4. Titas Anciukevičius (3 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com