Papers
Topics
Authors
Recent
2000 character limit reached

Does 3D Gaussian Splatting Need Accurate Volumetric Rendering? (2502.19318v1)

Published 26 Feb 2025 in cs.GR and cs.CV

Abstract: Since its introduction, 3D Gaussian Splatting (3DGS) has become an important reference method for learning 3D representations of a captured scene, allowing real-time novel-view synthesis with high visual quality and fast training times. Neural Radiance Fields (NeRFs), which preceded 3DGS, are based on a principled ray-marching approach for volumetric rendering. In contrast, while sharing a similar image formation model with NeRF, 3DGS uses a hybrid rendering solution that builds on the strengths of volume rendering and primitive rasterization. A crucial benefit of 3DGS is its performance, achieved through a set of approximations, in many cases with respect to volumetric rendering theory. A naturally arising question is whether replacing these approximations with more principled volumetric rendering solutions can improve the quality of 3DGS. In this paper, we present an in-depth analysis of the various approximations and assumptions used by the original 3DGS solution. We demonstrate that, while more accurate volumetric rendering can help for low numbers of primitives, the power of efficient optimization and the large number of Gaussians allows 3DGS to outperform volumetric rendering despite its approximations.

Summary

  • The paper introduces a framework analyzing 3D Gaussian Splatting (3DGS) approximations, finding opacity-based splatting superior to extinction methods with many primitives.
  • Experiments show that 3DGS rendering approximations, including simplified sorting and self-attenuation, have negligible visual impact with a high number of Gaussians.
  • The findings imply that 3DGS approximations work well because a high number of primitives offers sufficient expressiveness, reducing the need for strictly accurate volumetric rendering.

The paper "Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?" analyzes the approximations made by 3D Gaussian Splatting (3DGS) for real-time novel view synthesis, contrasting them with the principled volumetric rendering of Neural Radiance Fields (NeRFs). It introduces a mathematical framework to clarify the differences between 3DGS and accurate volumetric rendering, focusing on opacity versus extinction-based rendering. The paper presents extinction-based splatting and ray-marching algorithms for Gaussian primitives and evaluates the impact of 3DGS approximations on visual quality and performance.

The authors clarify the distinction between the learned opacity value in 3DGS and the extinction function used in volumetric rendering, where extinction is referred to as "density" in NeRF literature. To facilitate analysis, an extinction-based splatting solution is introduced. Experiments indicate that the extinction-based solution performs better with a small number of primitives, but this reverses as the number of primitives increases, with opacity splatting performing best. This suggests that as the number of Gaussians increases, rendering them with 3DGS becomes as expressive as volumetric rendering.

The paper notes that 3DGS resolves visibility through a single global sorting step based on Gaussian centers, which is an approximation that causes popping artifacts. Spatial overlap of Gaussians is ignored, which deviates from the volumetric rendering integral. A ray-marching algorithm on 3D Gaussians is implemented to paper the impact of this approximation, revealing that these approximations have a negligible impact on still images, especially with a large number of Gaussians.

Other approximations made by 3DGS, such as incorrect treatment of self-attenuation and approximate screen-space shape projection, are also shown to have little impact on the effectiveness of 3DGS. The key contributions of the paper include:

  • A mathematical framework clarifying the differences between 3DGS and accurate volumetric rendering.
  • Introducing extinction-based splatting and ray-marching algorithms for Gaussian primitives, along with a closed-form solution for splatting self-attenuated Gaussians.
  • Demonstrating that opacity-based splatting results in lower error compared to extinction-based methods when using a sufficiently high number of primitives.
  • Showing that for a low number of Gaussians, correct overlap resolution and extinction-based rendering improves image quality, while correct sorting does not significantly affect results.

Mathematical Framework

The paper revisits the volumetric rendering integral:

I(p)=0c(r,t)f(r(t))e0tf(r(τ))dτdtI(\textbf{p}) = \int_0^\infty c(\textbf{r},t) f(\textbf{r}(t)) e^{-\int_0^t f(\textbf{r}(\tau)) d\tau} dt

Where:

  • I(p)I(\textbf{p}) is the image function, parameterized by pixel p\textbf{p}.
  • c(r,t)c(\textbf{r}, t) is the radiance at r(t)\textbf{r}(t) in the direction of ray r\textbf{r}.
  • f(r(t))f(\textbf{r}(t)) is the extinction coefficient at r(t)\textbf{r}(t).
  • r\textbf{r} is the viewing ray, parameterized by distance tt.

This integral models direct volume rendering with attenuation and source terms. The paper then specializes this for a Gaussian representation of the extinction function. It uses both normalized Gaussian functions:

GDn(x,w,μ,Σ)=wND(x;μ,Σ)\mathcal{G}^n_D(\textbf{x}, w, \mu, \Sigma) = w \mathcal{N}_D(\textbf{x}; \mu, \Sigma)

Where:

  • GDn\mathcal{G}^n_D is the D-dimensional normalized Gaussian function.
  • x\textbf{x} is a point in RD\mathbb{R}^D.
  • ww is a weight parameter.
  • μ\mu is the D-dimensional position (mean).
  • Σ\Sigma is the shape (covariance matrix).
  • ND\mathcal{N}_D is the normal distribution's PDF.

And unnormalized Gaussian functions:

GDu(x,a,μ,Σ)=aID(Σ)ND(x;μ,Σ)\mathcal{G}^u_D(\textbf{x}, a, \mu, \Sigma) = a \mathcal{I}_D(\Sigma) \mathcal{N}_D(\textbf{x}; \mu, \Sigma)

Where:

  • GDu\mathcal{G}^u_D is the D-dimensional unnormalized Gaussian function.
  • aa is the amplitude.
  • ID(Σ)\mathcal{I}_D(\Sigma) is the normalization factor for the exponential part of a D-dimensional normalized Gaussian function.

The extinction function is modeled by a mixture of Gaussians:

f(x)=i=0NG3n(x,wi,μi,Σi)f(\textbf{x}) = \sum_{i=0}^N \mathcal{G}^n_3(\textbf{x}, w_i, \mu_i, \Sigma_i)

EWA and 3D Gaussian Splatting

The paper details that to avoid the high cost of volume integration, both Elliptical Weighted Average (EWA) and 3DGS simplify the rendering of 3D Gaussians by reducing them to 2D Gaussians that can be easily "splatted."

EWA exploits simplifications to find the 2D extinction contribution function fif_i of Gaussian ii from its 3D definition:

fi(p)=G2n(p,wi,μi,Σi)=G3n(r(t),wi,μi,Σi)dtf_i(\textbf{p}) = \mathcal{G}^n_2(\textbf{p}, w_i, \mu'_i, \Sigma'_i) = \int_{-\infty}^\infty \mathcal{G}^n_3(\textbf{r}(t), w_i, \mu_i, \Sigma_i) dt

Where:

  • μ\mu' and Σ\Sigma' are projected 2D mean and covariance matrix.

In contrast, 3DGS uses unnormalized Gaussians and preserves 2D amplitude aa' across all projections:

oi(p)=G2u(p,ai,μi,Σi)o_i(\textbf{p}) = \mathcal{G}^u_2(\textbf{p}, a'_i, \mu'_i, \Sigma'_i)

The computation of Σ\Sigma' involves transforming the Gaussian from world-space coordinates to screen space, approximated using a locally-affine counterpart:

Σ=JWΣWTJT\Sigma' = J W \Sigma W^T J^T

Where:

  • JJ is the Jacobian matrix.
  • WW is the transformation to camera space.

The attenuation term is approximated by the first-order Taylor expansion of exe^x, resulting in the image function:

I(p)=i=0Nci(r)gi(p)j=0i1(1gj(p))+cbi=0N(1gi(p))I(\textbf{p}) = \sum_{i=0}^N c_i(\textbf{r}) g_i(\textbf{p}) \prod_{j=0}^{i-1} (1 - g_j(\textbf{p})) + c_b \prod_{i=0}^N (1 - g_i(\textbf{p}))

Where:

  • cic_i is an evaluation of the spherical harmonics in the viewing direction.
  • cbc_b is the background color.
  • gig_i is the ii-th Gaussian's partial contribution, either extinction or opacity.

Analysis of 3DGS Representation and Approximations

The paper analyzes the key difference between EWA and 3DGS, i.e., the use of 2D opacity instead of extinction-based values. It introduces a unified framework for computing Gaussian-based extinction functions across EWA and 3DGS, using an abstract data term θ\theta to derive the appearance of each Gaussian.

For EWA Splatting, the stored per-Gaussian data term θ\theta corresponds to ww, the total integral of each normalized Gaussian function. The unnormalized Gaussian amplitudes are:

a=θI3(Σ)a = \frac{\theta}{\mathcal{I}_3(\Sigma)}

a=θI2(Σ)a' = \frac{\theta}{\mathcal{I}_2(\Sigma')}

3D Gaussian Splatting stores an "opacity" term on the 3D primitives, which is a constant, view-independent quantity for amplitude aa' of projected Gaussians in 2D:

a=θa' = \theta

The view-dependent solution for aa in 3D can be recovered from θ\theta:

w=I2(Σ)θw = \mathcal{I}_2(\Sigma')\theta

a=wI3(Σ)=I2(Σ)I3(Σ)θa = \frac{w}{\mathcal{I}_3(\Sigma)} = \frac{\mathcal{I}_2(\Sigma')}{\mathcal{I}_3(\Sigma)}\theta

Optimization with EWA-based Extinction

The paper aims to adapt EWA-based splatting for gradient-descent-based optimization. To ensure robustness under optimization and the ability to model thin, solid objects, the paper arrives at a scheme called opacity-thin-side (OTS). OTS dynamically scales the learned weight θ\theta such that a=θa'=\theta when looking at the Gaussian facing its thinnest side:

a=θI2(Σ)I3(Σ)a = \theta \frac{\mathcal{I}^*_2(\Sigma)}{\mathcal{I}_3(\Sigma)}

a=θI2(Σ)I2(Σ)a' = \theta \frac{\mathcal{I}^*_2(\Sigma)}{\mathcal{I}_2(\Sigma')}

Where:

  • I2\mathcal{I}^*_2 is the largest possible I2\mathcal{I}_2.

Attenuation and Self-Attenuation

Both EWA splatting and 3DGS ignore how a Gaussian's extinction affects its own appearance, referred to as self-attenuation. To address attenuation in a principled manner, the paper revisits the volumetric integration equation for a Gaussian mixture with just one Gaussian:

I(p)=c0(r)G3n(r(t),0)etG3n(r(τ),0)dτdtI(\textbf{p}) = c_0(\textbf{r}) \int_{-\infty}^\infty \mathcal{G}^n_3(\textbf{r}(t), 0) e^{-\int_{-\infty}^t \mathcal{G}^n_3(\textbf{r}(\tau), 0) d\tau} dt

The closed-form solution is:

I(p)=c0(r)(1ef0(p))I(\textbf{p}) = c_0(\textbf{r}) (1 - e^{-f_0(\textbf{p})})

This solution extends to multiple Gaussians:

I(p)=i=0Nci(r)(1efi(p))j=0i1(efj(p))+cbi=0N(efi(p))I(\textbf{p}) = \sum_{i=0}^N c_i(\textbf{r}) (1 - e^{-f_i(\textbf{p})}) \prod_{j=0}^{i-1} (e^{-f_j(\textbf{p})}) + c_b \prod_{i=0}^N (e^{-f_i(\textbf{p})})

Visibility

The paper addresses the issues of Gaussian overlap and depth sorting. To assess the importance of exact visibility for scene reconstruction and novel-view synthesis, a principled ray marching-based renderer for 3D Gaussians is designed.

Evaluation

Numerical experiments are conducted to determine the impact of approximations on reconstruction quality, using the NeRF synthetic dataset and additional volumetric datasets. The evaluation uses image quality metrics such as SSIM, PSNR, and LPIPS.

Whiteboard

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 255 likes about this paper.