VarSplat: Uncertainty-Aware 3D Gaussian SLAM
- VarSplat is an online dense RGB-D SLAM system that augments 3D Gaussian splatting with learned per-splat appearance variance for explicit uncertainty estimation.
- It renders a differentiable per-pixel uncertainty map via the law of total variance to improve tracking, submap registration, and loop detection in challenging regions.
- Experimental evaluations on Replica, TUM-RGBD, ScanNet, and ScanNet++ show state-of-the-art tracking accuracy and competitive reconstruction and rendering performance.
VarSplat is an uncertainty-aware 3D Gaussian Splatting system for online dense RGB-D SLAM. It augments each Gaussian with a learned per-splat appearance variance, renders a differentiable per-pixel uncertainty map by applying the law of total variance under alpha compositing, and uses that uncertainty to guide tracking, submap registration, and loop detection. The method is designed for failure regimes in which prior 3DGS-SLAM pipelines treat measurement reliability too implicitly, including low-texture regions, transparent surfaces, and scenes with complex reflectance, where local instability can accumulate into drift, ghosting, and unstable global alignment (Tran et al., 10 Mar 2026).
1. Problem setting and system scope
VarSplat targets online dense RGB-D SLAM built on 3D Gaussian Splatting. As in recent 3DGS-SLAM systems, it maintains a map of Gaussians and estimates camera poses by differentiably rendering color and depth from that map. Its central premise is that existing 3DGS-SLAM approaches optimize photometric and geometric residuals without explicitly modeling when rendered appearance is trustworthy, even though reliability varies sharply across the scene (Tran et al., 10 Mar 2026).
The motivating failure modes are concrete. In low-texture regions, photometric residuals become uninformative or noisy. At depth discontinuities and occlusion boundaries, small pose changes alter visibility and alpha weights, which destabilizes rendered color and depth. Transparent, specular, reflective, and glossy surfaces violate simple deterministic color assumptions. These local issues propagate into tracking drift, submap registration errors, ghosting, and loop-closure instability. Relative to earlier uncertainty-aware SLAM systems, VarSplat is distinguished by treating appearance uncertainty produced directly by the 3DGS rasterizer as a first-class quantity, rather than limiting uncertainty modeling to geometric variance or relying on pretrained uncertainty predictors (Tran et al., 10 Mar 2026).
The system is submap-based. Each submap is a collection of Gaussians
where is the mean position, is the covariance, is the scale, is the opacity, is the color derived from spherical harmonics, and is the learned per-splat appearance variance. This variance is per-splat and per-channel, and it models uncertainty around mean color rather than spatial extent. The paper notes that the notation is chosen to enforce positivity and follow conventional Gaussian-form uncertainty, but it does not clearly specify an explicit positivity-enforcing reparameterization or the initialization of (Tran et al., 10 Mar 2026).
2. Uncertainty formulation and rendered variance
VarSplat inherits standard 3DGS alpha compositing. With front-to-back depth ordering, the transmittance and weights are
0
Rendered color and depth are
1
where 2 is the camera-space depth of the projected Gaussian mean (Tran et al., 10 Mar 2026).
Its distinctive contribution is the uncertainty map derived from the law of total variance,
3
In VarSplat, 4 is the pixel color and 5 indexes contributing splats. Conditioned on splat 6,
7
This yields the rendered per-pixel variance map
8
The decomposition has two terms. The within-component term,
9
captures learned uncertainty of individual splats. The between-component term,
0
captures disagreement among overlapping splats. This makes the uncertainty map sensitive not only to intrinsically unreliable splats but also to ambiguous blending at occlusion boundaries, disocclusions, and reflective or transparent regions (Tran et al., 10 Mar 2026).
A practical property is that 1 is rendered in the same alpha-compositing pass as color and depth. The rasterizer accumulates 2, 3, 4, and 5, then forms 6 by subtracting 7. This preserves single-pass rasterization efficiency and differentiability, and avoids Monte Carlo sampling or a separate uncertainty network (Tran et al., 10 Mar 2026).
3. Use of uncertainty in tracking, registration, and loop detection
VarSplat converts variance into confidence weights by median-centered log scaling: 8
9
Here 0 is per-pixel confidence derived from rendered 1, and 2 is per-splat confidence derived from learned 3. Larger-than-median variance yields smaller weight; smaller-than-median variance yields larger weight (Tran et al., 10 Mar 2026).
In tracking, the current pose is estimated relative to the active submap using rendered 4, 5, and 6. The intended tracking objective weights photometric residuals by 7 while leaving depth residuals unweighted, with 8 balancing color and depth. This design reflects the paper’s observation that RGB residuals are especially unstable under viewpoint change, low texture, and occlusion. Tracking further uses an inlier mask 9 that removes pixels whose depth error exceeds 0 the median depth error in the current frame and removes pixels with invalid depth. A soft alpha mask 1 is also used. During tracking, variance is frozen and gradients are stopped through 2, so pose optimization does not interfere with variance learning (Tran et al., 10 Mar 2026).
Registration after loop detection uses the same uncertainty principle: photometric residuals are weighted by 3, depth residuals are left unweighted, and variance is fixed during registration. The paper attributes improved medium-range alignment and reduced ghosting between overlapping submaps to this weighting strategy (Tran et al., 10 Mar 2026).
Loop detection operates at submap level and uses per-splat variance rather than per-pixel variance. Following LoopSplat, keyframe descriptors provide candidate matches, and similarity is modulated by a reliability score
4
Submaps supported mainly by high-variance splats receive smaller reliability factors, which reduces their influence in loop matching. The paper states that this reduces false closures on repeated structure and improves long-range consistency. Supplementary details specify NetVLAD with VGG16-NetVLAD-Pitts30K weights from HLoc, followed by overlap-ratio filtering from front-end poses (Tran et al., 10 Mar 2026).
4. Optimization, map management, and implementation
VarSplat jointly optimizes camera poses, Gaussian parameters, and 5 during mapping. The intended mapping objective is
6
The color term is the standard 3DGS combination of 7 and SSIM,
8
the depth term is
9
and 0 regularizes Gaussian scales, similar to GS-SLAM, although the precise variable definitions for that term are not fully clear from the paper text (Tran et al., 10 Mar 2026).
Variance learning uses a Gaussian negative-log-likelihood-style term,
1
This is deliberately based on squared 2 residuals rather than 3, because the paper treats 4 as a Gaussian variance. The derivative with respect to rendered variance is
5
and by chain rule
6
Thus each splat’s variance is updated in proportion to its compositing weight. Mapping learns poses, Gaussian geometry and appearance, and 7 jointly; tracking and registration freeze variance; loop closure does not propagate gradients into variance because it occurs after submap construction (Tran et al., 10 Mar 2026).
The submap-based pipeline initializes Gaussians by backprojecting RGB-D points from the first keyframe, adds Gaussians in unobserved regions or merges overlaps, and starts a new submap when camera motion exceeds a spatial threshold from the current submap centroid or accumulated tracking uncertainty passes a preset limit. Supplementary settings specify 8 m and 9, with alternative fixed-frame heuristics for ScanNet and ScanNet++. New Gaussians are initialized with opacity 0 and scales from nearest neighbor. Pruning thresholds are 1 for Replica and 2 for the other datasets. On ScanNet++, if the tracking loss exceeds 3 the running average, the pose is reinitialized with ICP odometry (Tran et al., 10 Mar 2026).
The reported implementation uses Python 3.10, PyTorch 2.4.1, CUDA 12.6, and NVIDIA A100 80GB. The original 3DGS rasterizer and a depth-rendering extension are modified to propagate variance. Default mapping weights are 4, 5, 6, and 7. Dataset-specific tracking hyperparameters include 8, 9, 0, 1, 2, and 3, with, for example, 4 on Replica, 5 on TUM-RGBD, 6 on ScanNet, and 7 on ScanNet++ (Tran et al., 10 Mar 2026).
5. Experimental evaluation
VarSplat is evaluated on Replica, TUM-RGBD, ScanNet, and ScanNet++. Tracking is measured with ATE RMSE on keyframes; reconstruction with depth 8 and mesh 9; rendering with PSNR, SSIM, and LPIPS; and ScanNet++ also reports novel-view synthesis PSNR. Baselines include SplaTAM, MonoGS, Gaussian-SLAM, LoopSplat, CG-SLAM, and Uni-SLAM (Tran et al., 10 Mar 2026).
The strongest tracking results appear on real-world datasets. On Replica, VarSplat reports the best average tracking accuracy among the compared methods with 0 cm, versus 1 for LoopSplat, 2 for CG-SLAM, and 3 for Gaussian-SLAM. On ScanNet++, it reports an average ATE RMSE of 4 cm, compared with 5 for LoopSplat and 6 for Gaussian-SLAM; the paper explicitly states that this is about 7 better than the second-best method and emphasizes robustness on large-motion real-world sequences. On TUM-RGBD, the average is 8, compared with 9 for LoopSplat, 0 for CG-SLAM, and 1 for MonoGS. On ScanNet, the average is 2, compared with 3 for Uni-SLAM, 4 for GO-SLAM, 5 for LoopSplat, and 6 for CG-SLAM (Tran et al., 10 Mar 2026).
Reconstruction and rendering remain competitive. On Replica, depth 7 is 8 versus 9 for LoopSplat, while mesh 00 is 01 versus 02. The paper uses this to argue that uncertainty-aware weighting improves pose estimation without degrading mesh quality. Average input-view rendering scores are 03 PSNR / 04 SSIM / 05 LPIPS on Replica, 06 on TUM-RGBD, and 07 on ScanNet. On ScanNet++ novel view synthesis, VarSplat reports 08 PSNR, compared with 09 for LoopSplat and 10 for Gaussian-SLAM (Tran et al., 10 Mar 2026).
Ablation results indicate that uncertainty contributes across the full SLAM stack. On ScanNet, removing uncertainty entirely yields 11 ATE RMSE; using it only in tracking yields 12; tracking plus loop yields 13; loop plus registration yields 14; and the full system using uncertainty in tracking, loop detection, and registration yields 15. A second ablation reports that the best variant freezes variance during tracking, includes the depth residual in variance training, and uses squared 16 in the NLL term; removing any of these choices degrades performance. Runtime measurements on Replica/Room0 on A100 report mapping at 17 s/frame and 18 ms/iter, tracking at 19 s/frame and 20 ms/iter, and ATE 21, versus LoopSplat’s 22 s/frame mapping, 23 s/frame tracking, and ATE 24. The paper presents the method as online rather than as strictly real-time in the conventional sense (Tran et al., 10 Mar 2026).
6. Relation to the broader splat literature and limitations
VarSplat occupies a specific niche within the Gaussian-splatting literature: uncertainty-aware online RGB-D SLAM. It is neither a physics-based appearance model nor a VR-oriented renderer, nor an interpretability framework. AstroSplat, for example, replaces the usual spherical-harmonic appearance computation with planetary reflectance models for rendering and reconstruction of small celestial bodies (Nolan et al., 12 Mar 2026). VRSplat targets virtual reality by combining Mini-Splatting, StopThePop, and Optimal Projection, together with a single-pass foveated rasterizer (Tu et al., 15 May 2025). XSPLAIN addresses ante-hoc interpretability for splat-based classification rather than SLAM, using prototype-based explanations over 3D Gaussian primitives (Galus et al., 10 Feb 2026). Splat-LOAM is LiDAR-native and geometry-first, using 2D Gaussian surface splats and spherical rasterization for LiDAR odometry and mapping (Giacomini et al., 21 Mar 2025). A plausible implication is that VarSplat should be read not as a generic reformulation of 3DGS, but as a renderer-level uncertainty extension specialized to RGB-D pose estimation and submap alignment.
The current limitations are explicit. The system still relies on depth-based Gaussian insertion, so performance is constrained when depth is sparse or missing. It models appearance uncertainty only, not a full joint appearance-and-geometry uncertainty. Learning and rendering variance adds computation and memory overhead. Experiments focus on mostly static scenes. Several implementation details remain underspecified in the paper text, notably the exact positivity parameterization for 25, the initialization of variance, and the formulation of global refinement after submap merging (Tran et al., 10 Mar 2026).
These caveats delimit the method’s scope. The principal technical novelty is the learned per-splat appearance variance together with the rendered uncertainty map
26
which is then used coherently in tracking, registration, and loop detection. Within that scope, VarSplat provides a concrete formulation of how uncertainty can be made native to the 3DGS rasterizer rather than appended as an external predictor or reduced to depth variance alone (Tran et al., 10 Mar 2026).