Papers
Topics
Authors
Recent
2000 character limit reached

SplaTAM: Volumetric 3D Gaussian SLAM

Updated 17 November 2025
  • SplaTAM is an explicit volumetric SLAM system that employs weighted 3D Gaussian ellipsoids to achieve dense RGB-D reconstruction and efficient online tracking.
  • It leverages a differentiable rendering pipeline with joint photometric and geometric residuals, using analytic Jacobians to optimize camera pose and refine map parameters.
  • The system outperforms traditional approaches by offering real-time performance, reduced memory usage, and improved reconstruction accuracy compared to volumetric grids and NeRF-based SLAM methods.

SplaTAM is an explicit volumetric SLAM (Simultaneous Localization and Mapping) system that utilizes weighted 3D Gaussian ellipsoids as map primitives for dense RGB-D reconstruction and online tracking. By integrating 3D Gaussian splatting into SLAM, it enables fast, high-fidelity scene modeling from a single unposed RGB-D camera. SplaTAM combines a differentiable rendering pipeline, online nonlinear camera tracking, and a memory-efficient map update strategy. Major motivations for SplaTAM include surpassing the fidelity, efficiency, and generality of previous dense SLAM paradigms such as volumetric grids, point clouds, and implicit neural fields.

1. Map Representation with Weighted 3D Gaussians

The central data structure in SplaTAM is a map G={Gi}i=1MG = \{ G_i \}_{i=1}^M where each map element GiG_i is a 3D Gaussian defined by:

  • Mean position μiR3\mu_i \in \mathbb{R}^3
  • Covariance ΣiR3×3\Sigma_i \in \mathbb{R}^{3 \times 3} (symmetric, positive definite)
  • Color or radiance ciR3c_i \in \mathbb{R}^3
  • Opacity αi[0,1]\alpha_i \in [0,1]

The scene's volumetric density at point xx is modeled as: ϕi(x)=αi(2π)3/2Σi1/2exp(12(xμi)Σi1(xμi))\phi_i(x) = \alpha_i (2\pi)^{-3/2} |\Sigma_i|^{-1/2} \exp\left(-\frac{1}{2} (x - \mu_i)^\top \Sigma_i^{-1} (x - \mu_i)\right) Summing over all Gaussians, the global density is Φ(x)=i=1Mϕi(x)\Phi(x) = \sum_{i=1}^M \phi_i(x).

This continuous, differentiable scene model enables analytic computation of both photometric renderings and spatial gradients needed for optimization.

2. Volumetric Splatting and Differentiable Rendering

To project the 3D Gaussian model onto the camera image:

  • Each Gaussian is transformed to camera coordinates using the current pose T=(R,t)SE(3)T = (R, t) \in SE(3): μic=Rμi+t\mu_i^c = R \mu_i + t.
  • The mean is projected by π(μic)\pi(\mu_i^c) to the image.
  • The covariance translates to a 2D elliptical footprint via Σiu=JprojΣiJproj\Sigma_i^u = J_{\text{proj}} \Sigma_i J_{\text{proj}}^\top, where JprojJ_{\text{proj}} is the projection Jacobian.

Rendering proceeds by per-pixel accumulation: wi(p)=αi(2π)1Σiu1/2exp(12(pui)(Σiu)1(pui))w_i(p) = \alpha_i (2\pi)^{-1} |\Sigma_i^u|^{-1/2} \exp\left(-\frac{1}{2}(p - u_i)^\top (\Sigma_i^u)^{-1}(p - u_i)\right) with the front-to-back compositing: C(p)=iwi(p)ci O(p)=1i(1wi(p))\begin{align*} C(p) &= \sum_i w_i(p) c_i \ O(p) &= 1 - \prod_i (1 - w_i(p)) \end{align*} A silhouette mask M(p)=1M(p) = 1 if O(p)>τO(p) > \tau (τ0.5\tau \approx 0.5) and 0 otherwise, serves to identify newly observed or unmapped pixels.

This rendering strategy supports analytic derivatives with respect to both camera pose and all Gaussian parameters, which is critical for joint optimization within SLAM.

3. Online Tracking with Joint Photometric and Geometric Residuals

Camera pose refinement at each step solves the following optimization:

  • For each RGB-D input frame (It,Dt)(I_t, D_t), residuals are evaluated over the visible domain Ωvis={pM(p)=1Dt(p) valid}\Omega_{\text{vis}} = \{ p \mid M(p) = 1 \land D_t(p) \textrm{ valid}\}:
    • Photometric: rpc(T)=It(p)C(p;T,M)r^c_p(T) = I_t(p) - C(p;T, M)
    • Geometric: rpd(T)=Dt(p)Y^(p;T,M)r^d_p(T) = D_t(p) - \widehat{Y}(p; T, M), where Y^\widehat{Y} is rendered depth
  • Aggregate loss: E(T)=pΩviswc(rpc)2+wd(rpd)2E(T) = \sum_{p \in \Omega_{\text{vis}}} w_c (r^c_p)^2 + w_d (r^d_p)^2 Optimization proceeds via Gauss–Newton or Levenberg–Marquardt, leveraging analytic Jacobians: rpcT=iCwiwiuiuiT\frac{\partial r_p^c}{\partial T} = - \sum_i \frac{\partial C}{\partial w_i} \frac{\partial w_i}{\partial u_i} \frac{\partial u_i}{\partial T} The closed-form Jacobians for the SE(3) pose and image projection yield highly effective, data-driven camera localization from raw measurements.

4. Incremental Map Building: Densification, Refinement, Culling

The 3D Gaussian map is dynamically updated to reflect new observations:

  • Densification: For pixels where Dt(p)<Y^(p)τdD_t(p) < \widehat{Y}(p) - \tau_d (τd50×\tau_d \approx 50 \times median depth error), a new Gaussian GnewG_{\text{new}} is initialized at camera-space point Tt1π1(p,Dt(p))T_t^{-1}\pi^{-1}(p, D_t(p)), with small isotropic covariance, color from It(p)I_t(p), and nominal opacity.
  • Parameter Refinement: Periodically, all (or a sliding window of) Gaussian parameters {μi,Σi,ci,αi}\{\mu_i, \Sigma_i, c_i, \alpha_i\} are jointly optimized to minimize accumulated photometric error across recent frames, via analytic gradients.
  • Culling and Splitting: Gaussians with opacity αi\alpha_i or accumulated rendering weight below threshold are pruned, while large/anisotropic Gaussians may be split to improve reconstructions of thin structures.

Such adaptivity yields a compact representation, with only a few thousand Gaussians required versus millions of voxels or MLP weights in prior systems.

5. Computational Performance and Experimental Results

SplaTAM demonstrates performance benchmarks that emphasize its efficiency and fidelity:

  • Absolute Trajectory Error (ATE) on ScanNet++: $0.55$ cm (surpassing methods with ATE >1.1>1.1 cm).
  • Reconstruction Novel-View PSNR: $28.1$ dB (train views), $23.99$ dB (novel views)—~2×2\times improvement on depth and RGB accuracy versus NeRF-based SLAMs.
  • Runtime: Real-time execution on a single GPU, 30\approx 30 ms per frame for joint tracking and rendering.
  • Memory usage: Approximately 60%60\,\% of a NeRF-SLAM system; SplaTAM uses only thousands of Gaussians.

Pseudocode for an update iteration is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Input: RGB-D frame (I_t, D_t), previous map M, pose T_{t−1}
1.  Initialize T_t ← T_{t−1}
2.  for iter = 1..N_track_iters do
      Render C(p), Ŷ(p), mask M(p) via splatting with T_t
      Compute residuals r^c_p, r^d_p over Ω_vis
      Build JTJ, JTr and solve Δξ
      Update T_t ← exp(Δξ) · T_t
    end for
3.  Densification:
    For each pixel p with D_t(p) valid:
      if D_t(p) < Ŷ(p) − τ_d:
        Add Gaussian at μ_new, Σ_0, c_new, α_0
4.  If t mod K == 0:
      Optimize map parameters {μ_i,Σ_i,c_i,α_i} over last W frames
5.  Cull low‐weight Gaussians, optionally split large ones
Output: pose T_t, updated map M

6. Comparative Advantages and Relationship to Prior Work

SplaTAM advances dense SLAM through:

  • Continuous, closed-form differentiable mapping: Allows direct analytic computation of pose and map parameter updates, in contrast to sampled point-based methods or implicit MLP fields.
  • Efficient alpha-composited splatting: Enables real-time rendering and optimization at full image resolution.
  • Adaptive, event-driven map allocation: New Gaussians are instantiated only for previously unmapped or newly observed regions, extending mapping coverage dynamically without superfluous resource use.

Compared to 2D Gaussian surfel approaches (e.g., S3S^3LAM (Fan et al., 28 Jul 2025)), which focus on explicit surface orientation and radial Jacobians, SplaTAM's explicit volumetric density supports robust tracking but may lack certain geometrically accurate rotational cues provided by surfel normals. SplaTAM exceeds point-based and NeRF-based SLAM systems in both compactness and photometric/geometric accuracy under standard benchmarks.

This suggests that explicit, volumetric, differentiable 3D Gaussian representation—coupled with online analytic pose and map updates—constitutes a state-of-the-art foundation for dense, real-time RGB-D SLAM from a single camera.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to SplaTAM.