Does 3D Gaussian Splatting Need Accurate Volumetric Rendering? (2502.19318v1)

Published 26 Feb 2025 in cs.GR and cs.CV

Abstract: Since its introduction, 3D Gaussian Splatting (3DGS) has become an important reference method for learning 3D representations of a captured scene, allowing real-time novel-view synthesis with high visual quality and fast training times. Neural Radiance Fields (NeRFs), which preceded 3DGS, are based on a principled ray-marching approach for volumetric rendering. In contrast, while sharing a similar image formation model with NeRF, 3DGS uses a hybrid rendering solution that builds on the strengths of volume rendering and primitive rasterization. A crucial benefit of 3DGS is its performance, achieved through a set of approximations, in many cases with respect to volumetric rendering theory. A naturally arising question is whether replacing these approximations with more principled volumetric rendering solutions can improve the quality of 3DGS. In this paper, we present an in-depth analysis of the various approximations and assumptions used by the original 3DGS solution. We demonstrate that, while more accurate volumetric rendering can help for low numbers of primitives, the power of efficient optimization and the large number of Gaussians allows 3DGS to outperform volumetric rendering despite its approximations.

Summary

The paper introduces a framework analyzing 3D Gaussian Splatting (3DGS) approximations, finding opacity-based splatting superior to extinction methods with many primitives.
Experiments show that 3DGS rendering approximations, including simplified sorting and self-attenuation, have negligible visual impact with a high number of Gaussians.
The findings imply that 3DGS approximations work well because a high number of primitives offers sufficient expressiveness, reducing the need for strictly accurate volumetric rendering.

The paper "Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?" analyzes the approximations made by 3D Gaussian Splatting (3DGS) for real-time novel view synthesis, contrasting them with the principled volumetric rendering of Neural Radiance Fields (NeRFs). It introduces a mathematical framework to clarify the differences between 3DGS and accurate volumetric rendering, focusing on opacity versus extinction-based rendering. The paper presents extinction-based splatting and ray-marching algorithms for Gaussian primitives and evaluates the impact of 3DGS approximations on visual quality and performance.

The authors clarify the distinction between the learned opacity value in 3DGS and the extinction function used in volumetric rendering, where extinction is referred to as "density" in NeRF literature. To facilitate analysis, an extinction-based splatting solution is introduced. Experiments indicate that the extinction-based solution performs better with a small number of primitives, but this reverses as the number of primitives increases, with opacity splatting performing best. This suggests that as the number of Gaussians increases, rendering them with 3DGS becomes as expressive as volumetric rendering.

The paper notes that 3DGS resolves visibility through a single global sorting step based on Gaussian centers, which is an approximation that causes popping artifacts. Spatial overlap of Gaussians is ignored, which deviates from the volumetric rendering integral. A ray-marching algorithm on 3D Gaussians is implemented to paper the impact of this approximation, revealing that these approximations have a negligible impact on still images, especially with a large number of Gaussians.

Other approximations made by 3DGS, such as incorrect treatment of self-attenuation and approximate screen-space shape projection, are also shown to have little impact on the effectiveness of 3DGS. The key contributions of the paper include:

A mathematical framework clarifying the differences between 3DGS and accurate volumetric rendering.
Introducing extinction-based splatting and ray-marching algorithms for Gaussian primitives, along with a closed-form solution for splatting self-attenuated Gaussians.
Demonstrating that opacity-based splatting results in lower error compared to extinction-based methods when using a sufficiently high number of primitives.
Showing that for a low number of Gaussians, correct overlap resolution and extinction-based rendering improves image quality, while correct sorting does not significantly affect results.

Mathematical Framework

The paper revisits the volumetric rendering integral:

$I(\textbf{p}) = \int_0^\infty c(\textbf{r},t) f(\textbf{r}(t)) e^{-\int_0^t f(\textbf{r}(\tau)) d\tau} dt$

Where:

$I(\textbf{p})$ is the image function, parameterized by pixel $\textbf{p}$ .
$c(\textbf{r}, t)$ is the radiance at $\textbf{r}(t)$ in the direction of ray $\textbf{r}$ .
$f(\textbf{r}(t))$ is the extinction coefficient at $\textbf{r}(t)$ .
$\textbf{r}$ is the viewing ray, parameterized by distance $t$ .

This integral models direct volume rendering with attenuation and source terms. The paper then specializes this for a Gaussian representation of the extinction function. It uses both normalized Gaussian functions:

$\mathcal{G}^n_D(\textbf{x}, w, \mu, \Sigma) = w \mathcal{N}_D(\textbf{x}; \mu, \Sigma)$

Where:

$\mathcal{G}^n_D$ is the D-dimensional normalized Gaussian function.
$\textbf{x}$ is a point in $\mathbb{R}^D$ .
$w$ is a weight parameter.
$\mu$ is the D-dimensional position (mean).
$\Sigma$ is the shape (covariance matrix).
$\mathcal{N}_D$ is the normal distribution's PDF.

And unnormalized Gaussian functions:

$\mathcal{G}^u_D(\textbf{x}, a, \mu, \Sigma) = a \mathcal{I}_D(\Sigma) \mathcal{N}_D(\textbf{x}; \mu, \Sigma)$

Where:

$\mathcal{G}^u_D$ is the D-dimensional unnormalized Gaussian function.
$a$ is the amplitude.
$\mathcal{I}_D(\Sigma)$ is the normalization factor for the exponential part of a D-dimensional normalized Gaussian function.

The extinction function is modeled by a mixture of Gaussians:

$f(\textbf{x}) = \sum_{i=0}^N \mathcal{G}^n_3(\textbf{x}, w_i, \mu_i, \Sigma_i)$

EWA and 3D Gaussian Splatting

The paper details that to avoid the high cost of volume integration, both Elliptical Weighted Average (EWA) and 3DGS simplify the rendering of 3D Gaussians by reducing them to 2D Gaussians that can be easily "splatted."

EWA exploits simplifications to find the 2D extinction contribution function $f_i$ of Gaussian $i$ from its 3D definition:

$f_i(\textbf{p}) = \mathcal{G}^n_2(\textbf{p}, w_i, \mu'_i, \Sigma'_i) = \int_{-\infty}^\infty \mathcal{G}^n_3(\textbf{r}(t), w_i, \mu_i, \Sigma_i) dt$

Where:

$\mu'$ and $\Sigma'$ are projected 2D mean and covariance matrix.

In contrast, 3DGS uses unnormalized Gaussians and preserves 2D amplitude $a'$ across all projections:

$o_i(\textbf{p}) = \mathcal{G}^u_2(\textbf{p}, a'_i, \mu'_i, \Sigma'_i)$

The computation of $\Sigma'$ involves transforming the Gaussian from world-space coordinates to screen space, approximated using a locally-affine counterpart:

$\Sigma' = J W \Sigma W^T J^T$

Where:

$J$ is the Jacobian matrix.
$W$ is the transformation to camera space.

The attenuation term is approximated by the first-order Taylor expansion of $e^x$ , resulting in the image function:

$I(\textbf{p}) = \sum_{i=0}^N c_i(\textbf{r}) g_i(\textbf{p}) \prod_{j=0}^{i-1} (1 - g_j(\textbf{p})) + c_b \prod_{i=0}^N (1 - g_i(\textbf{p}))$

Where:

$c_i$ is an evaluation of the spherical harmonics in the viewing direction.
$c_b$ is the background color.
$g_i$ is the $i$ -th Gaussian's partial contribution, either extinction or opacity.

Analysis of 3DGS Representation and Approximations

The paper analyzes the key difference between EWA and 3DGS, i.e., the use of 2D opacity instead of extinction-based values. It introduces a unified framework for computing Gaussian-based extinction functions across EWA and 3DGS, using an abstract data term $\theta$ to derive the appearance of each Gaussian.

For EWA Splatting, the stored per-Gaussian data term $\theta$ corresponds to $w$ , the total integral of each normalized Gaussian function. The unnormalized Gaussian amplitudes are:

$a = \frac{\theta}{\mathcal{I}_3(\Sigma)}$

$a' = \frac{\theta}{\mathcal{I}_2(\Sigma')}$

3D Gaussian Splatting stores an "opacity" term on the 3D primitives, which is a constant, view-independent quantity for amplitude $a'$ of projected Gaussians in 2D:

$a' = \theta$

The view-dependent solution for $a$ in 3D can be recovered from $\theta$ :

$w = \mathcal{I}_2(\Sigma')\theta$

$a = \frac{w}{\mathcal{I}_3(\Sigma)} = \frac{\mathcal{I}_2(\Sigma')}{\mathcal{I}_3(\Sigma)}\theta$

Optimization with EWA-based Extinction

The paper aims to adapt EWA-based splatting for gradient-descent-based optimization. To ensure robustness under optimization and the ability to model thin, solid objects, the paper arrives at a scheme called opacity-thin-side (OTS). OTS dynamically scales the learned weight $\theta$ such that $a'=\theta$ when looking at the Gaussian facing its thinnest side:

$a = \theta \frac{\mathcal{I}^*_2(\Sigma)}{\mathcal{I}_3(\Sigma)}$

$a' = \theta \frac{\mathcal{I}^*_2(\Sigma)}{\mathcal{I}_2(\Sigma')}$

Where:

$\mathcal{I}^*_2$ is the largest possible $\mathcal{I}_2$ .

Attenuation and Self-Attenuation

Both EWA splatting and 3DGS ignore how a Gaussian's extinction affects its own appearance, referred to as self-attenuation. To address attenuation in a principled manner, the paper revisits the volumetric integration equation for a Gaussian mixture with just one Gaussian:

$I(\textbf{p}) = c_0(\textbf{r}) \int_{-\infty}^\infty \mathcal{G}^n_3(\textbf{r}(t), 0) e^{-\int_{-\infty}^t \mathcal{G}^n_3(\textbf{r}(\tau), 0) d\tau} dt$

The closed-form solution is:

$I(\textbf{p}) = c_0(\textbf{r}) (1 - e^{-f_0(\textbf{p})})$

This solution extends to multiple Gaussians:

$I(\textbf{p}) = \sum_{i=0}^N c_i(\textbf{r}) (1 - e^{-f_i(\textbf{p})}) \prod_{j=0}^{i-1} (e^{-f_j(\textbf{p})}) + c_b \prod_{i=0}^N (e^{-f_i(\textbf{p})})$

Visibility

The paper addresses the issues of Gaussian overlap and depth sorting. To assess the importance of exact visibility for scene reconstruction and novel-view synthesis, a principled ray marching-based renderer for 3D Gaussians is designed.

Evaluation

Numerical experiments are conducted to determine the impact of approximations on reconstruction quality, using the NeRF synthetic dataset and additional volumetric datasets. The evaluation uses image quality metrics such as SSIM, PSNR, and LPIPS.