Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections (2407.12306v2)

Published 17 Jul 2024 in cs.CV

Abstract: Novel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers faster training and real-time rendering, adapting it for unconstrained image collections is non-trivial due to the substantially different architecture. In this paper, we introduce Splatfacto-W, an approach that integrates per-Gaussian neural color features and per-image appearance embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. Our key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in in-the-wild scenarios. Our method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS. Additional video results and code integrated into Nerfstudio are available at https://kevinxu02.github.io/splatfactow/.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces Splatfacto-W, which extends 3D Gaussian Splatting with per-Gaussian neural color features and appearance embeddings, achieving a 5.3 dB PSNR boost and 150× faster training than NeRF methods.
It employs an efficient transient object masking strategy that minimizes interference from photometric variations and noisy regions in in-the-wild image datasets.
The framework integrates a robust background modeling approach using spherical harmonics, ensuring high multiview consistency and real-time rendering at over 40 fps.

Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections

"Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections" presents a significant advancement in the domain of novel view synthesis from in-the-wild image collections. The authors introduce a comprehensive framework that leverages and extends 3D Gaussian Splatting (3DGS) to address the inherent challenges of photometric variations and transient occluders typically found in such datasets. The key contributions of Splatfacto-W include integrating per-Gaussian neural color features, per-image appearance embeddings, and an effective background model for improved scene reconstruction.

Technical Contributions

Latent Appearance Modeling:
- Framework: The approach assigns dedicated appearance features to each Gaussian point and employs an MLP (Multi-Layer Perceptron) to predict spherical harmonics coefficients based on these features and the appearance embedding vectors. This adaptation method provides an efficient mechanism to handle varying photometric appearances without compromising rendering speed.
- Improvements: This method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to baseline 3DGS. Moreover, it boosts the training speed by 150 times compared to NeRF-based methods, ensuring compatibility with real-time rendering requirements.
Transient Object Handling:
- Robust Mask: The paper implements an efficient masking strategy to exclude transient objects and noisy regions during optimization. This minimizes the influence of inconsistent scene elements and promotes a focus on stable scene features. By leveraging spatial smoothness properties and residual analysis, the method ensures that only high-confidence scene parts contribute to the Gaussian splatting optimization.
Background Modeling:
- Prior Utilization and Spherical Harmonics: By effectively modeling the background using spherical harmonics with per-image embeddings, the approach maintains higher multiview consistency in in-the-wild scenes. This method corrects common misrepresentation issues of sky and distant background elements, mitigating the depth inconsistency problem typically observed in rudimentary 3DGS implementations.

Empirical Validation

The paper substantiates its claims through rigorous experiments conducted on several challenging datasets, including the Brandenburg Gate, Trevi Fountain, and Sacre Coeur. The results demonstrate that Splatfacto-W outperforms several state-of-the-art methods, including NeRF-W and 3DGS variants like SWAG and GS-W. Specifically, the PSNR, SSIM, and LPIPS metrics reflect the superior quality of scene reconstructions achieved by Splatfacto-W.

Efficiency: Remarkably, the method achieves a rendering speed of over 40 frames per second (fps) on an RTX 2080Ti, making it highly suitable for practical applications requiring real-time performance. This efficiency is obtained without extensive caching, which underscores the robustness and scalability of the proposed solution.

Implications and Future Directions

The research presented in this paper holds substantial theoretical and practical implications. Theoretically, it bridges the gap between implicit and explicit field representations by cleverly utilizing appearance features and efficient transient handling mechanisms. Practically, the Splatfacto-W framework sets a new standard for real-time novel view synthesis in dynamic and challenging real-world scenarios, such as virtual reality and augmented reality applications.

Future developments could explore more sophisticated neural architectures to enhance transient phenomena representation and address the limitations related to special lighting conditions. Additionally, incorporating advanced neural network components to refine background modeling further could ameliorate the observed high-frequency detail representation issues.

Conclusion

This paper contributes a well-rounded, efficient solution to the persistent challenges of novel view synthesis from in-the-wild image collections. By innovatively extending 3D Gaussian Splatting through latent appearance modeling, robust transient object handling, and an effective background representation strategy, Splatfacto-W achieves high-quality, consistent, and real-time scene reconstruction. The practical and theoretical advancements introduced by this research pave the way for next-generation applications in VR, AR, and other interactive 3D environments.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/CongrongX/status/1813841179114631503

https://twitter.com/zhenjun_zhao/status/1813977215899754692

https://twitter.com/amoufarek/status/1813891510708138362

https://twitter.com/_vztu/status/1814784050508538174

https://twitter.com/arxivsanitybot/status/1813928895256637696