Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields (2408.03822v1)

Published 7 Aug 2024 in cs.CV

Abstract: 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussian-based representation and introduces an approximated volumetric rendering, achieving very fast rendering speed and promising image quality. Furthermore, subsequent studies have successfully extended 3DGS to dynamic 3D scenes, demonstrating its wide range of applications. However, a significant drawback arises as 3DGS and its following methods entail a substantial number of Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric and temporal attributes by residual vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25x reduced storage and enhanced rendering speed compared to 3DGS for static scenes, while maintaining the quality of the scene representation. For dynamic scenes, our approach achieves more than 12x storage efficiency and retains a high-quality reconstruction compared to the existing state-of-the-art methods. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at https://maincold2.github.io/c3dgs/.

Authors (5)

Joo Chan Lee (10 papers)
Daniel Rho (13 papers)
Xiangyu Sun (16 papers)
Jong Hwan Ko (30 papers)
Eunbyung Park (42 papers)

Citations (2)

View on Semantic Scholar

Summary

Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields

In the field of neural rendering, Neural Radiance Fields (NeRFs) have exhibited significant capabilities in reconstructing photorealistic 3D scenes using a collection of 2D images. A notable barrier to their widespread utilization, however, is the computational inefficiency originating from ray-wise volumetric rendering. As an alternative, 3D Gaussian splatting (3DGS) has gained traction given its promise in offering fast rendering speeds paired with commendable image quality through the usage of a 3D Gaussian-based representation. Despite its advantages, a critical drawback of 3DGS is its voluminous memory and storage demands, contingent on maintaining a high number of Gaussians for image fidelity.

The paper proposes methodologies to mitigate these issues, focusing on two primary goals: the reduction of Gaussian points without performance degradation, and the compression of Gaussian attributes (view-dependent color, covariance). An innovative learnable mask strategy is introduced, which effectively reduces the number of Gaussians while preserving high performance. Additionally, a compact view-dependent color representation utilizing a grid-based neural field is employed, diverging from the traditional reliance on spherical harmonics. To further compact the model, the paper also proposes learning codebooks via residual vector quantization (R-VQ) to represent the geometric and temporal attributes compactly.

Key Contributions

Learnable Mask Strategy: The paper introduces an end-to-end optimization framework incorporating a learnable mask applied to Gaussian attributes, reducing redundancy by eliminating Gaussians with minimal performance impact.
Compact View-Dependent Color Representation: Shifting from spherical harmonics to a grid-based neural field helps to more efficiently represent view-dependent colors.
Residual Vector Quantization (R-VQ): This technique is applied to encode geometrical and temporal attributes, capitalizing on the limited variability among Gaussians.

Numerical Results and Performance

The developed methodology was rigorously tested on various datasets, including Mip-NeRF 360, Tanks and Temples, Deep Blending, and NeRF-Synthetic. It consistently demonstrated over 25x reduced storage for static scenes while accelerating rendering speeds compared to the original 3DGS. For dynamic scenes, the research achieved more than 12x storage efficiency and retained high-quality reconstructions.

Static Scenes: On datasets like Mip-NeRF 360 and Tanks and Temples, the proposed method closely matched the original 3DGS in rendering quality but with drastically reduced storage demands—down from multiple gigabytes to mere tens of megabytes. Additionally, the rendering performance saw substantial improvements.
Dynamic Scenes: Using datasets like DyNeRF and Technicolor, the approach achieved significant compression while maintaining or slightly reducing the computational overhead compared to state-of-the-art methods such as STG.

Implications and Future Directions

The demonstrated compression techniques for 3D Gaussian representations have broad implications for neural rendering and various interactive 3D applications where both storage and computational efficiency are crucial. This work lays a foundation for more practical and scalable neural rendering systems.

Theoretical advancements accounted for in this research, such as the novel learnable masking and efficient use of R-VQ, provide a substantial leap towards more compact neural representations that do not compromise on the rendering quality. Practically, this work paves the way for more enablement of real-time rendering on resource-constrained devices, thereby broadening the applications in fields such as augmented reality (AR), virtual reality (VR), and even mobile computing.

Future research could develop further on optimizing the grid-based neural field or explore additional masking strategies reflecting different properties or contextual importance of Gaussians. Another promising avenue would be integrating these techniques with more advanced hardware accelerations tailored to neural rendering tasks, enabling even broader accessibility and efficiency.

Overall, this paper contributes significantly to the ongoing advancements in neural rendering, presenting methodologies that bridge the gap between quality and computational efficiency both in static and dynamic 3D scenes.

PDF Markdown

Related Papers

Find Related Papers

GitHub

Tweets

https://twitter.com/_akhaliq/status/1821367732895432746

https://twitter.com/janusch_patas/status/1821447374041907568

https://twitter.com/CSVisionPapers/status/1821698320839979499

https://twitter.com/arXivGPT/status/1822038812534378551

https://twitter.com/javaeeeee1/status/1822641008342962352