Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization (2403.06912v3)

Published 11 Mar 2024 in cs.CV

Abstract: Radiance fields have demonstrated impressive performance in synthesizing novel views from sparse input views, yet prevailing methods suffer from high training costs and slow inference speed. This paper introduces DNGaussian, a depth-regularized framework based on 3D Gaussian radiance fields, offering real-time and high-quality few-shot novel view synthesis at low costs. Our motivation stems from the highly efficient representation and surprising quality of the recent 3D Gaussian Splatting, despite it will encounter a geometry degradation when input views decrease. In the Gaussian radiance fields, we find this degradation in scene geometry primarily lined to the positioning of Gaussian primitives and can be mitigated by depth constraint. Consequently, we propose a Hard and Soft Depth Regularization to restore accurate scene geometry under coarse monocular depth supervision while maintaining a fine-grained color appearance. To further refine detailed geometry reshaping, we introduce Global-Local Depth Normalization, enhancing the focus on small local depth changes. Extensive experiments on LLFF, DTU, and Blender datasets demonstrate that DNGaussian outperforms state-of-the-art methods, achieving comparable or better results with significantly reduced memory cost, a $25 \times$ reduction in training time, and over $3000 \times$ faster rendering speed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Novel view synthesis in tensor space. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1034–1040. IEEE, 1997.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  3. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  4. Nope-nerf: Optimising neural radiance field with no pose prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4160–4169, 2023.
  5. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14124–14133, 2021.
  6. Tensorf: Tensorial radiance fields. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, pages 333–350. Springer, 2022.
  7. Neurbf: A neural fields representation with adaptive radial basis functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4182–4194, 2023.
  8. Stereo radiance fields (srf): Learning view synthesis for sparse views of novel scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7911–7920, 2021.
  9. Enhancing nerf akin to enhancing llms: Generalizable nerf transformer with mixture-of-view-experts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3193–3204, 2023.
  10. Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20637–20647, 2023.
  11. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12882–12891, 2022.
  12. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  13. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12479–12488, 2023.
  14. Consistentnerf: Enhancing neural radiance fields with 3d consistency for sparse view synthesis. arXiv preprint arXiv:2305.11031, 2023a.
  15. Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19774–19783, 2023b.
  16. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5885–5894, 2021.
  17. Large scale multi-view stereopsis evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 406–413, 2014.
  18. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  19. Infonerf: Ray entropy minimization for few-shot neural volume rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12912–12921, 2022.
  20. Viewformer: Nerf-free neural rendering from few images using transformers. In European Conference on Computer Vision (ECCV), 2022.
  21. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
  22. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9298–9309, 2023.
  23. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713, 2023.
  24. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4):1–14, 2019.
  25. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  26. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  27. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5480–5490, 2022.
  28. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
  29. Vision transformers for dense prediction. ICCV, 2021.
  30. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 2022.
  31. Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12892–12901, 2022.
  32. Zeronvs: Zero-shot 360-degree view synthesis from a single real image. arXiv preprint arXiv:2310.17994, 2023.
  33. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  34. Pixelwise View Selection for Unstructured Multi-View Stereo. In European Conference on Computer Vision (ECCV), 2016.
  35. Flipnerf: Flipped reflection rays for few-shot novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22883–22893, 2023.
  36. Därf: Boosting radiance fields from sparse inputs with monocular depth adaptation, 2023.
  37. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
  38. Vgos: Voxel grid optimization for view synthesis from sparse inputs. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 1414–1422. International Joint Conferences on Artificial Intelligence Organization, 2023. Main Track.
  39. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  40. Scade: Nerfs from space carving with ambiguity-aware depth estimates. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16518–16527, 2023.
  41. Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recognition, 124:108498, 2022.
  42. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9065–9076, 2023.
  43. Multi-view stereo in the deep learning era: A comprehensive review. Displays, 70:102102, 2021.
  44. Robust training for multi-view stereo networks with noisy labels. Displays, 81:102604, 2024a.
  45. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  46. A contrastive learning based unsupervised multi-view stereo with multi-stage self-training strategy. Displays, page 102672, 2024b.
  47. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  48. Dof-nerf: Depth-of-field meets neural radiance fields. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1718–1729, 2022.
  49. Diffusionerf: Regularizing neural radiance fields with denoising diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4180–4189, 2023.
  50. Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4479–4489, 2023.
  51. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
  52. Freenerf: Improving few-shot neural rendering with free frequency regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8254–8263, 2023.
  53. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021a.
  54. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021b.
  55. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems, 35:25018–25032, 2022.
  56. Hierarchical normalization for robust monocular depth estimation. Advances in Neural Information Processing Systems, 35:14128–14139, 2022a.
  57. Revisiting domain generalized stereo matching networks from a feature consistency perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13001–13011, 2022b.
  58. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  59. View synthesis by appearance flow. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 286–301. Springer, 2016.
  60. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
Citations (53)

Summary

  • The paper introduces a dual depth regularization approach that accurately positions Gaussian primitives while preserving visual detail.
  • It integrates global-local depth normalization in loss functions to capture both fine-grained local depth variations and overall spatial coherence.
  • Experimental results demonstrate a 25× reduction in training time and over 300 FPS rendering, setting a new benchmark for novel view synthesis.

An Expert Review of "DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization"

The paper "DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization" introduces an innovative approach for enhancing the efficiency and quality of novel view synthesis using sparse input views. This is achieved through a framework built upon 3D Gaussian radiance fields employing depth normalization strategies. The framework, named DNGaussian, contributes significantly to the balance of real-time rendering speed and quality in image synthesis tasks.

Key Contributions

In the context of 3D vision and neural rendering, the paper addresses the computational intensity and time constraints posed by existing radiance fields, particularly neural radiance fields (NeRFs). Let us distill the critical aspects of DNGaussian as presented:

  1. Hard and Soft Depth Regularization: The paper introduces a dual approach to depth regularization, distinguishing between the solid spatial constraints imposing "hard depth" and the subtle opacity adjustments facilitated by "soft depth." This differentiation caters to more precise Gaussian primitive positioning while preserving visual detail integrity.
  2. Global-Local Depth Normalization: By incorporating both global and local depth normalization within the loss functions, DNGaussian refines the model's sensitivity to fine-grained local depth variations without losing sight of holistic spatial coherence. This technique leverages patch-wise localization alongside global consistency.
  3. Efficiency in Real-Time Rendering: Notably, the framework achieves substantial reductions in memory usage and training time, quantified as a 25×25\times reduction in training duration and a remarkable rendering speed exceeding 300 frames per second (FPS), emphasizing its potential applicability in dynamic image synthesis environments.

Experimental Validation

The authors substantiate DNGaussian's performance through comprehensive experiments on well-acknowledged datasets like LLFF, DTU, and Blender. The results reveal that DNGaussian either parallels or surpasses state-of-the-art methods with significant reductions in computational overhead, verified by benchmarks such as PSNR, LPIPS, and SSIM. The method's proficiency in preserving high-directional detail at reduced input views is highlighted as a distinguishing trait.

Implications and Future Directions

The introduction of DNGaussian proposes an important stride towards scalable and practical neural rendering solutions. The ability to efficiently optimize 3D Gaussian radiance fields can influence expansive areas including augmented reality, virtual reality, and real-time graphics where resource availability varies.

Looking ahead, avenues for further research could involve investigating the integration of additional geometric priors to address sparse-view density variations, enhancing robustness towards reflective or specular surfaces, or refining the framework's scalability to more diverse and larger scenes. The potential cross-compatibility with emerging rendering paradigms could facilitate broader adoption across AI-driven visualization fields.

In summary, the paper presents a well-rounded, technically robust method for sparse-view novel view synthesis, marking a critical development in overcoming the impedance of training costs and inefficiencies inherent to prior art solutions in neural rendering.