Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis (2411.00144v3)

Published 31 Oct 2024 in cs.CV and cs.GR

Abstract: 3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in novel view synthesis (NVS). However, 3DGS tends to overfit when trained with sparse views, limiting its generalization to novel viewpoints. In this paper, we address this overfitting issue by introducing Self-Ensembling Gaussian Splatting (SE-GS). We achieve self-ensembling by incorporating an uncertainty-aware perturbation strategy during training. A $\mathbf{\Delta}$-model and a $\mathbf{\Sigma}$-model are jointly trained on the available images. The $\mathbf{\Delta}$-model is dynamically perturbed based on rendering uncertainty across training steps, generating diverse perturbed models with negligible computational overhead. Discrepancies between the $\mathbf{\Sigma}$-model and these perturbed models are minimized throughout training, forming a robust ensemble of 3DGS models. This ensemble, represented by the $\mathbf{\Sigma}$-model, is then used to generate novel-view images during inference. Experimental results on the LLFF, Mip-NeRF360, DTU, and MVImgNet datasets demonstrate that our approach enhances NVS quality under few-shot training conditions, outperforming existing state-of-the-art methods. The code is released at: https://sailor-z.github.io/projects/SEGS.html.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. arXiv preprint arXiv:2002.06470, 2020.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5855–5864, 2021.
  3. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022.
  4. Spherical averages and applications to spherical splines and interpolation. ACM Transactions on Graphics, 20(2):95–126, 2001.
  5. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19457–19467, 2024.
  6. A survey on 3d gaussian splatting. arXiv preprint arXiv:2401.03890, 2024.
  7. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. arXiv preprint arXiv:2403.14627, 2024.
  8. Novel view synthesis with multiple 360 images for large-scale 6-dof virtual reality system. In IEEE Conference on Virtual Reality and 3D User Interfaces, pages 880–881. IEEE, 2019.
  9. Volume rendering. ACM Siggraph Computer Graphics, 22(4):65–74, 1988.
  10. Instantsplat: Unbounded sparse-view pose-free gaussian splatting in 40 seconds. arXiv preprint arXiv:2403.20309, 2024.
  11. Self-ensembling for visual domain adaptation. arXiv preprint arXiv:1706.05208, 2017.
  12. Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115:105151, 2022.
  13. Loss surfaces, mode connectivity, and fast ensembling of dnns. Advances in Neural Information Processing Systems, 31, 2018.
  14. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2495–2504, 2020.
  15. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5354–5363, 2024.
  16. Binocular-guided 3d gaussian splatting with view consistency for sparse view synthesis. arXiv preprint arXiv:2410.18822, 2024.
  17. Efficientnerf efficient neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12902–12911, 2022.
  18. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5885–5894, 2021.
  19. Large scale multi-view stereopsis evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 406–413, 2014.
  20. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1, 2023.
  21. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
  22. Why m heads are better than one: Training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314, 2015.
  23. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20775–20785, 2024.
  24. Mvsgaussian: Fast generalizable gaussian splatting reconstruction from multi-view stereo. In European Conference on Computer Vision, pages 37–53. Springer, 2024.
  25. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (ToG), 38(4):1–14, 2019.
  26. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  27. Self: Learning to filter noisy labels with self-ensembling. arXiv preprint arXiv:1910.01842, 2019.
  28. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5480–5490, 2022.
  29. Coherentgs: Sparse novel view synthesis with coherent 3d gaussians. arXiv preprint arXiv:2403.19495, 2, 2024.
  30. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  31. Rahul Rahaman et al. Uncertainty quantification and deep ensembles. Advances in Neural Information Processing Systems, 34:20063–20075, 2021.
  32. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12179–12188, 2021.
  33. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  34. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016.
  35. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 519–528. IEEE, 2006.
  36. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
  37. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  38. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9065–9076, 2023.
  39. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021a.
  40. Freesplat: Generalizable 3d gaussian splatting towards free-view synthesis of indoor scenes. arXiv preprint arXiv:2405.17958, 2024.
  41. Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021b.
  42. Surface reconstruction from gaussian splatting via novel stereo views. arXiv preprint arXiv:2404.01810, 2024.
  43. Mvpgs: Excavating multi-view priors for gaussian splatting from sparse input views. arXiv preprint arXiv:2409.14316, 2024.
  44. Freenerf: Improving few-shot neural rendering with free frequency regularization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8254–8263, 2023.
  45. Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In Medical image computing and computer assisted intervention, pages 605–613. Springer, 2019.
  46. Mvimgnet: A large-scale dataset of multi-view images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9150–9161, 2023.
  47. Mip-splatting: Alias-free 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19447–19456, 2024.
  48. Cor-gs: sparse-view 3d gaussian splatting via co-regularization. In European Conference on Computer Vision, pages 335–352. Springer, 2024.
  49. View synthesis by appearance flow. In European Conference on Computer Vision, pages 286–301. Springer, 2016.
  50. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817, 2018.
  51. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
  52. Ensembling neural networks: many could be better than all. Artificial intelligence, 137(1-2):239–263, 2002.
  53. Fsgs: Real-time few-shot view synthesis using gaussian splatting. In European Conference on Computer Vision, pages 145–163. Springer, 2024.

Summary

  • The paper presents a novel self-ensembling mechanism that regularizes 3D Gaussian Splatting models, reducing overfitting in few-shot novel view synthesis.
  • It employs a dual-model system with an uncertainty-aware perturbation strategy that generates diverse temporal samples for more robust training.
  • Experimental results demonstrate significant improvements in PSNR and perceptual metrics on datasets like LLFF, DTU, and Mip-NeRF360 compared to existing methods.

Self-Ensembling Gaussian Splatting for Few-shot Novel View Synthesis: An In-depth Analysis

The paper "Self-Ensembling Gaussian Splatting for Few-shot Novel View Synthesis" presents a novel approach aimed at enhancing the performance of 3D Gaussian Splatting (3DGS) in the context of sparse-view novel view synthesis (NVS). The core contribution lies in addressing the overfitting issues that arise when 3DGS models are trained with limited training views. The proposed technique introduces self-ensembling Gaussian Splatting (SE-GS), a method that employs regularization to derive robust and generalizable models through a self-ensembling mechanism, fundamentally improving the model's capacity to deliver high-quality NVS from few-shot images.

Key Contributions

The authors address significant limitations in the current state-of-the-art methods in NVS, particularly when handling sparse training views. They introduce SE-GS, which integrates a dual-model system known as the Σ\Sigma-model and Δ\Delta-model. The latter is subjected to an uncertainty-aware perturbing strategy, dynamically generating temporal samples in the Gaussian parameter space. This strategy effectively circumvents the computational inefficiency associated with training multiple models as seen in previous approaches such as CoR-GS.

Methodological Insights

The Δ\Delta-model is designed to represent a temporal sample derived from the Gaussian parameter space at each training step. A unique uncertainty-aware perturbation mechanism is employed, leveraging pseudo views to estimate the reliability of renderings without substantial extra costs. This mechanism involves the dynamic update of image buffers, permitting the computation of uncertainties which guide the perturbation process. Resultantly, the Δ\Delta-model yields diverse temporal samples by perturbing regions with high uncertainty, thus providing a broader exploration of the parameter space.

Conversely, the Σ\Sigma-model serves as the final ensemble model. It is trained with a regularization framework applied across temporal samples drawn from the Δ\Delta-model. This involves minimizing discrepancies between pseudo view renderings from both models, ensuring that the Σ\Sigma-model integrates information from various samples, thereby enhancing its robustness against overfitting concerns.

Experimental Validation

The efficacy of SE-GS is validated across multi-view datasets such as LLFF, DTU, Mip-NeRF360, and MVImgNet, where it consistently outperforms the leading approaches, including both NeRF and other 3DGS based methods. Particularly noteworthy is its superior performance with fewer data views, showcasing its stability and efficacy under scenarios where obtaining an extensive dataset is challenging.

SE-GS demonstrates significant improvements in PSNR and other perceptual metrics compared to alternatives such as FSGS and DNGaussian, which incorporate auxiliary data, or CoR-GS, employing cross-model regularization. The results underline SE-GS's capability to yield visually superior novel views, with sharper and more accurate reproductions of textures and finer details.

Theoretical and Practical Implications

Practically, SE-GS potentially broadens the scope of applications in virtual and augmented reality where high-quality 3D reconstruction from minimal data is a typical constraint. Theoretically, it opens new directions for exploring efficient and effective ensembling techniques within the 3DGS frameworks, demonstrating that model robustness and generalization can be achieved without extensive retraining overheads.

Future Directions

Future research could explore extending the uncertainty-aware mechanism to other neural representation learning scenarios or further optimizing buffer-based strategies for even faster inference speeds. Additionally, investigating the scalability of SE-GS with comprehensive scene hierarchies or increased resolution remains an intriguing line of inquiry.

In conclusion, this work provides a substantial contribution to the landscape of sparse-view novel view synthesis, delivering an adept methodological breakthrough that allows models to combat overfitting effectively. As such, SE-GS represents a significant step forward in the development of robust, high-fidelity NVS under data-constrained conditions.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com