Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Feature Splatting for Better Novel View Synthesis with Low Overlap (2405.15518v2)

Published 24 May 2024 in cs.CV

Abstract: 3D Gaussian Splatting has emerged as a very promising scene representation, achieving state-of-the-art quality in novel view synthesis significantly faster than competing alternatives. However, its use of spherical harmonics to represent scene colors limits the expressivity of 3D Gaussians and, as a consequence, the capability of the representation to generalize as we move away from the training views. In this paper, we propose to encode the color information of 3D Gaussians into per-Gaussian feature vectors, which we denote as Feature Splatting (FeatSplat). To synthesize a novel view, Gaussians are first "splatted" into the image plane, then the corresponding feature vectors are alpha-blended, and finally the blended vector is decoded by a small MLP to render the RGB pixel values. To further inform the model, we concatenate a camera embedding to the blended feature vector, to condition the decoding also on the viewpoint information. Our experiments show that these novel model for encoding the radiance considerably improves novel view synthesis for low overlap views that are distant from the training views. Finally, we also show the capacity and convenience of our feature vector representation, demonstrating its capability not only to generate RGB values for novel views, but also their per-pixel semantic labels. Code available at https://github.com/tberriel/FeatSplat . Keywords: Gaussian Splatting, Novel View Synthesis, Feature Splatting

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022.
  2. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12882–12891, 2022.
  3. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  4. Depth field networks for generalizable multi-view scene representation. In European Conference on Computer Vision, pages 245–262. Springer, 2022.
  5. Towards zero-shot scale-aware monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9233–9243, 2023a.
  6. Delira: Self-supervised depth, light, and radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17935–17945, 2023b.
  7. Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (ToG), 37(6):1–15, 2018.
  8. Gaussian error linear units (gelus), 2023.
  9. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4):1–14, 2023.
  10. Lerf: Language embedded radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19729–19739, 2023.
  11. Adam: a method for stochastic optimization. In International Conference for Learning Representations,, 2015.
  12. Segment anything. arXiv:2304.02643, 2023.
  13. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
  14. Point-based neural rendering with per-view optimization. In Computer Graphics Forum, volume 40, pages 29–43. Wiley Online Library, 2021.
  15. Neural point catacaustics for novel-view synthesis of reflections. ACM Transactions on Graphics (TOG), 41(6):1–15, 2022.
  16. Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12881, 2022.
  17. Compact 3d gaussian representation for radiance field. arXiv preprint arXiv:2311.13681, 2023.
  18. Mine: Towards continuous depth mpi with nerf for novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12578–12588, 2021.
  19. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
  20. Revisiting sorting for gpgpu stream architectures. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 545–546, 2010.
  21. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  22. Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022.
  23. Donerf: Towards real-time rendering of compact neural radiance fields using depth oracle networks. In Computer Graphics Forum, volume 40, pages 45–59. Wiley Online Library, 2021.
  24. Dinov2: Learning robust visual features without supervision, 2023.
  25. Langsplat: 3d language gaussian splatting. arXiv preprint arXiv:2312.16084, 2023.
  26. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  27. Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12892–12901, 2022.
  28. Point-slam: Dense neural point cloud-based slam. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18433–18444, 2023.
  29. Distilled feature fields enable few-shot language-guided manipulation. In Conference on Robot Learning, pages 405–424. PMLR, 2023.
  30. imap: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6229–6238, 2021.
  31. DM-neRF: 3d scene geometry decomposition and manipulation from 2d images. In The Eleventh International Conference on Learning Representations, 2023.
  32. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Advances in Neural Information Processing Systems, 34:27171–27183, 2021.
  33. Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  34. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. 10.1109/TIP.2003.819861.
  35. Nerfingmvs: Guided optimization of neural radiance fields for indoor multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5610–5619, 2021.
  36. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022.
  37. Scannet++: A high-fidelity dataset of 3d indoor scenes. In Proceedings of the International Conference on Computer Vision (ICCV), 2023.
  38. Differentiable surface splatting for point-based geometry processing. ACM Transactions on Graphics (TOG), 38(6):1–14, 2019.
  39. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
  40. Editable free-viewpoint video using a layered neural representation. ACM Transactions on Graphics, 40(4):1–18, 2021.
  41. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  42. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12786–12796, 2022.
  43. Fmgs: Foundation model embedded 3d gaussian splatting for holistic 3d scene understanding. arXiv preprint arXiv:2401.01970, 2024.
  44. Ewa volume splatting. In Proceedings Visualization, 2001. VIS’01., pages 29–538. IEEE, 2001.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. T. Berriel Martins (2 papers)
  2. Javier Civera (62 papers)
Citations (2)

Summary

An Overview of Feature Splatting for Enhanced 3D Scene Representation

The paper introduces a novel approach to improving 3D scene representations, specifically targeting the limitations found in traditional methods such as Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS). The proposed method, termed Feature Splatting (FeatSplat), aims to address the key limitation of using spherical harmonics in 3DGS by adopting per-Gaussian feature vectors to encode color information. This essay explores the methodology, experiments, and implications of this approach, providing an insightful synthesis of the paper's contributions.

Introduction

The challenge of finding appropriate 3D scene representations is pivotal for applications in robotics, virtual reality (VR), and augmented reality (AR). Traditional scene representations like NeRFs are computationally intensive and scale poorly with scene size. The recent introduction of 3D Gaussian Splatting (3DGS) presents a faster alternative, but it relies on spherical harmonics to encode color, which limits the model's expressivity and generalization capabilities.

Methodology

Feature Splatting (FeatSplat) enhances 3DGS by replacing spherical harmonics with per-Gaussian feature vectors. This approach involves three key steps:

  1. Feature Vector Encoding: Each 3D Gaussian is initialized with a feature vector sampled from a normal distribution.
  2. Alpha Blending: During image synthesis, 3D Gaussians are projected into the image plane, and their corresponding feature vectors are alpha-blended.
  3. Multi-Layer Perceptron (MLP) Decoding: The blended feature vector is concatenated with a camera embedding and decoded by a small MLP to render RGB pixel values.

The paper also extends FeatSplat to semantic segmentation, demonstrating the flexibility of feature vector representations in encoding both RGB values and per-pixel semantic labels.

Preliminaries: 3D Gaussian Splatting

3DGS uses a set of 3D Gaussians to encode scene geometry and color information through spherical harmonics. The rendering process involves converting SHs to RGB values, projecting Gaussians into 2D, sorting them, and finally alpha-blending their colors. This method achieves high-quality rendering at a significantly lower computational cost compared to NeRFs, though it incurs a higher memory usage and suffers from poor generalization for complex textures and distant viewpoints.

Experiments

The paper evaluates FeatSplat on several datasets, including Mip-360, Tanks and Temples (T&T), Deep Blending (DB), and ScanNet++. The evaluation metrics included SSIM, PSNR, LPIPS, rendering speed (FPS), and memory usage.

Results on Mip-360, T&T, and DB

FeatSplat achieved the best PSNR on all three datasets and showed improved SSIM on two of them. The qualitative results highlighted FeatSplat’s ability to render accurate and detailed images, though it slightly lagged in rendering speed compared to 3DGS. Notably, FeatSplat halved the memory usage compared to 3DGS.

Generalization to Novel Views

FeatSplat demonstrated superior performance in synthesizing novel views far from training distributions, substantially reducing the artifacts seen with 3DGS. This was evident in sequences where the camera moved through isolated or largely unexplored regions, showcasing FeatSplat’s enhanced ability to adapt Gaussians’ representation flexibly based on viewing conditions.

Results on ScanNet++

FeatSplat significantly outperformed both 3DGS and Compact-3DGS across all metrics. This dataset posed a greater challenge with its independent test set trajectories, underscoring FeatSplat’s robustness in lower visual overlap and distant viewpoints.

Semantic FeatSplat

The extension to per-pixel semantic segmentation achieved a weighted mIOU of 0.629 while maintaining high rendering performance (PSNR 24.64, SSIM 0.875, LPIPS 0.244) at 56 FPS. The resulting segmentation maps were coherent, albeit slightly noisy at the edges.

Limitations

While FeatSplat significantly enhances representation capacity and generalization, it introduces a trade-off between capacity and speed. The compact MLP limits texture complexity, sometimes resulting in over-smoothing. The increased feature vector dimension also leads to slower rendering speeds compared to 3DGS, though the authors argue it remains within acceptable real-time limits.

Implications and Future Directions

FeatSplat extends the utility of 3DGS by providing a more expressive and flexible scene representation. Its capacity to encode multiple colors within a single Gaussian and condition decoding on viewpoint information offers robust improvements in practical applications like novel view synthesis and semantic segmentation. Future research could explore optimizing the balance between rendering speed and texture complexity, as well as further extending the versatility of feature vector representations in other 3D vision tasks.

By addressing key limitations in existing 3D scene representations, Feature Splatting represents a meaningful advance in enhancing the performance and applicability of novel view synthesis techniques. The authors’ commitment to releasing the code upon acceptance promises further developments and community-driven insights into this promising approach.