SplaTAM: Volumetric 3D Gaussian SLAM

Updated 17 November 2025

SplaTAM is an explicit volumetric SLAM system that employs weighted 3D Gaussian ellipsoids to achieve dense RGB-D reconstruction and efficient online tracking.
It leverages a differentiable rendering pipeline with joint photometric and geometric residuals, using analytic Jacobians to optimize camera pose and refine map parameters.
The system outperforms traditional approaches by offering real-time performance, reduced memory usage, and improved reconstruction accuracy compared to volumetric grids and NeRF-based SLAM methods.

SplaTAM is an explicit volumetric SLAM (Simultaneous Localization and Mapping) system that utilizes weighted 3D Gaussian ellipsoids as map primitives for dense RGB-D reconstruction and online tracking. By integrating 3D Gaussian splatting into SLAM, it enables fast, high-fidelity scene modeling from a single unposed RGB-D camera. SplaTAM combines a differentiable rendering pipeline, online nonlinear camera tracking, and a memory-efficient map update strategy. Major motivations for SplaTAM include surpassing the fidelity, efficiency, and generality of previous dense SLAM paradigms such as volumetric grids, point clouds, and implicit neural fields.

1. Map Representation with Weighted 3D Gaussians

The central data structure in SplaTAM is a map $G = \{ G_i \}_{i=1}^M$ where each map element $G_i$ is a 3D Gaussian defined by:

Mean position $\mu_i \in \mathbb{R}^3$
Covariance $\Sigma_i \in \mathbb{R}^{3 \times 3}$ (symmetric, positive definite)
Color or radiance $c_i \in \mathbb{R}^3$
Opacity $\alpha_i \in [0,1]$

The scene's volumetric density at point $x$ is modeled as: $\phi_i(x) = \alpha_i (2\pi)^{-3/2} |\Sigma_i|^{-1/2} \exp\left(-\frac{1}{2} (x - \mu_i)^\top \Sigma_i^{-1} (x - \mu_i)\right)$ Summing over all Gaussians, the global density is $\Phi(x) = \sum_{i=1}^M \phi_i(x)$ .

This continuous, differentiable scene model enables analytic computation of both photometric renderings and spatial gradients needed for optimization.

2. Volumetric Splatting and Differentiable Rendering

To project the 3D Gaussian model onto the camera image:

Each Gaussian is transformed to camera coordinates using the current pose $T = (R, t) \in SE(3)$ : $\mu_i^c = R \mu_i + t$ .
The mean is projected by $\pi(\mu_i^c)$ to the image.
The covariance translates to a 2D elliptical footprint via $\Sigma_i^u = J_{\text{proj}} \Sigma_i J_{\text{proj}}^\top$ , where $J_{\text{proj}}$ is the projection Jacobian.

Rendering proceeds by per-pixel accumulation: $w_i(p) = \alpha_i (2\pi)^{-1} |\Sigma_i^u|^{-1/2} \exp\left(-\frac{1}{2}(p - u_i)^\top (\Sigma_i^u)^{-1}(p - u_i)\right)$ with the front-to-back compositing: $\begin{align*} C(p) &= \sum_i w_i(p) c_i \ O(p) &= 1 - \prod_i (1 - w_i(p)) \end{align*}$ A silhouette mask $M(p) = 1$ if $O(p) > \tau$ ( $\tau \approx 0.5$ ) and 0 otherwise, serves to identify newly observed or unmapped pixels.

This rendering strategy supports analytic derivatives with respect to both camera pose and all Gaussian parameters, which is critical for joint optimization within SLAM.

3. Online Tracking with Joint Photometric and Geometric Residuals

Camera pose refinement at each step solves the following optimization:

For each RGB-D input frame $(I_t, D_t)$ $(I_{t}, D_{t})$ , residuals are evaluated over the visible domain $\Omega_{\text{vis}} = \{ p \mid M(p) = 1 \land D_t(p) \textrm{ valid}\}$ :
- Photometric: $r^c_p(T) = I_t(p) - C(p;T, M)$
- Geometric: $r^d_p(T) = D_t(p) - \widehat{Y}(p; T, M)$ , where $\widehat{Y}$ is rendered depth
Aggregate loss: $E(T) = \sum_{p \in \Omega_{\text{vis}}} w_c (r^c_p)^2 + w_d (r^d_p)^2$ Optimization proceeds via Gauss–Newton or Levenberg–Marquardt, leveraging analytic Jacobians: $\frac{\partial r_p^c}{\partial T} = - \sum_i \frac{\partial C}{\partial w_i} \frac{\partial w_i}{\partial u_i} \frac{\partial u_i}{\partial T}$ The closed-form Jacobians for the SE(3) pose and image projection yield highly effective, data-driven camera localization from raw measurements.

The 3D Gaussian map is dynamically updated to reflect new observations:

Densification: For pixels where $D_t(p) < \widehat{Y}(p) - \tau_d$ ( $\tau_d \approx 50 \times$ median depth error), a new Gaussian $G_{\text{new}}$ is initialized at camera-space point $T_t^{-1}\pi^{-1}(p, D_t(p))$ , with small isotropic covariance, color from $I_t(p)$ , and nominal opacity.
Parameter Refinement: Periodically, all (or a sliding window of) Gaussian parameters $\{\mu_i, \Sigma_i, c_i, \alpha_i\}$ are jointly optimized to minimize accumulated photometric error across recent frames, via analytic gradients.
Culling and Splitting: Gaussians with opacity $\alpha_i$ or accumulated rendering weight below threshold are pruned, while large/anisotropic Gaussians may be split to improve reconstructions of thin structures.

Such adaptivity yields a compact representation, with only a few thousand Gaussians required versus millions of voxels or MLP weights in prior systems.

5. Computational Performance and Experimental Results

SplaTAM demonstrates performance benchmarks that emphasize its efficiency and fidelity:

Absolute Trajectory Error (ATE) on ScanNet++: $0.55$ cm (surpassing methods with ATE $>1.1$ cm).
Reconstruction Novel-View PSNR: $28.1$ dB (train views), $23.99$ dB (novel views)—~ $2\times$ improvement on depth and RGB accuracy versus NeRF-based SLAMs.
Runtime: Real-time execution on a single GPU, $\approx 30$ ms per frame for joint tracking and rendering.
Memory usage: Approximately $60\,\%$ of a NeRF-SLAM system; SplaTAM uses only thousands of Gaussians.

Pseudocode for an update iteration is:

Input: RGB-D frame (I_t, D_t), previous map M, pose T_{t−1}
1.  Initialize T_t ← T_{t−1}
2.  for iter = 1..N_track_iters do
      Render C(p), Ŷ(p), mask M(p) via splatting with T_t
      Compute residuals r^c_p, r^d_p over Ω_vis
      Build JTJ, JTr and solve Δξ
      Update T_t ← exp(Δξ) · T_t
    end for
3.  Densification:
    For each pixel p with D_t(p) valid:
      if D_t(p) < Ŷ(p) − τ_d:
        Add Gaussian at μ_new, Σ_0, c_new, α_0
4.  If t mod K == 0:
      Optimize map parameters {μ_i,Σ_i,c_i,α_i} over last W frames
5.  Cull low‐weight Gaussians, optionally split large ones
Output: pose T_t, updated map M

6. Comparative Advantages and Relationship to Prior Work

SplaTAM advances dense SLAM through:

Continuous, closed-form differentiable mapping: Allows direct analytic computation of pose and map parameter updates, in contrast to sampled point-based methods or implicit MLP fields.
Efficient alpha-composited splatting: Enables real-time rendering and optimization at full image resolution.
Adaptive, event-driven map allocation: New Gaussians are instantiated only for previously unmapped or newly observed regions, extending mapping coverage dynamically without superfluous resource use.

Compared to 2D Gaussian surfel approaches (e.g., $S^3$ LAM (Fan et al., 28 Jul 2025)), which focus on explicit surface orientation and radial Jacobians, SplaTAM's explicit volumetric density supports robust tracking but may lack certain geometrically accurate rotational cues provided by surfel normals. SplaTAM exceeds point-based and NeRF-based SLAM systems in both compactness and photometric/geometric accuracy under standard benchmarks.

This suggests that explicit, volumetric, differentiable 3D Gaussian representation—coupled with online analytic pose and map updates—constitutes a state-of-the-art foundation for dense, real-time RGB-D SLAM from a single camera.

PDF Markdown Chat (Pro)

References (1)

$S^3$LAM: Surfel Splatting SLAM for Geometrically Accurate Tracking and Mapping (2025)

SplaTAM: Volumetric 3D Gaussian SLAM

1. Map Representation with Weighted 3D Gaussians

2. Volumetric Splatting and Differentiable Rendering

3. Online Tracking with Joint Photometric and Geometric Residuals

4. Incremental Map Building: Densification, Refinement, Culling

5. Computational Performance and Experimental Results

6. Comparative Advantages and Relationship to Prior Work

Whiteboard

Follow Topic

Continue Learning

SplaTAM: Volumetric 3D Gaussian SLAM

1. Map Representation with Weighted 3D Gaussians

2. Volumetric Splatting and Differentiable Rendering

3. Online Tracking with Joint Photometric and Geometric Residuals

4. Incremental Map Building: Densification, Refinement, Culling

5. Computational Performance and Experimental Results

6. Comparative Advantages and Relationship to Prior Work

Whiteboard

Follow Topic

Continue Learning

Related Topics