GaussianFlow SLAM: Monocular 3D Mapping
- GaussianFlow SLAM is a monocular SLAM framework that employs anisotropic 3D Gaussian splatting combined with dense optical flow supervision to resolve depth ambiguities.
- It utilizes analytic flow alignment losses and a recurrent, flow-guided bundle adjustment to jointly optimize camera pose and scene mapping.
- The approach achieves state-of-the-art trajectory estimation and rendering accuracy, demonstrating robust performance on standard datasets despite challenging imaging conditions.
GaussianFlow SLAM is a monocular simultaneous localization and mapping (SLAM) framework that employs 3D Gaussian splatting as its scene representation, with dense optical flow supervision—termed "GaussianFlow"—incorporated into both geometric map optimization and camera pose estimation. This approach addresses the inherent ambiguities of monocular input by providing consistent geometry-aware cues derived from optical flow alignment between frames, enabling state-of-the-art tracking fidelity and photorealistic scene rendering. GaussianFlow SLAM introduces analytic flow alignment losses, closed-form gradients, flow-guided bundle adjustment, and data-driven map densification and pruning modules, establishing a tightly coupled, recurrent optimization architecture for robust monocular SLAM (Seo et al., 17 Apr 2026).
1. Scene Representation and GaussianFlow Definition
The scene is modeled as a collection of anisotropic 3D Gaussians, each specified by position , covariance , RGB color , and opacity . Camera views project these Gaussians onto the image plane, where a tile-based rasterizer computes per-pixel color and geometry using blending weights.
For each image pixel in frame , the set of contributing Gaussians is determined. The projected 2D mean and covariance are derived for each . The GaussianFlow vector, denoting the predicted pixel-wise projection of underlying 3D scene motion induced by map and pose changes, is given by
0
where 1 is the Cholesky factor of the projected covariance, 2 is the 2D offset, and 3 the compositing weight.
2. Flow-Guided Optimization Framework
GaussianFlow SLAM alternates two tightly coupled, recurrently updated optimization modules:
- Tracking: For new keyframes, pose estimation minimizes a composite objective
4
where 5 is the 3DGS photometric+DSSIM error and 6 is the flow alignment loss.
- Mapping: Fixing recent poses, the mapping step minimizes
7
The window 8 indexes recent and strategically selected keyframes. 9 regularizes Gaussian shape and 0 encourages map compactness.
A tightly integrated flow-guided bundle adjustment (DBA) jointly refines the pose graph, using flow-induced correspondences as constraints. All loss gradients—including those for flow alignment—are computed analytically in the fused rasterizer kernel to facilitate GPU-scale backpropagation.
3. Flow Alignment Loss and Analytic Gradients
The core supervisory signal is the flow alignment loss, which encourages the projected motion of each Gaussian to match the learned dense optical flow 1 (estimated via a pretrained ConvGRU RAFT-like network):
2
with per-pixel negative log-likelihood loss
3
The distribution 4 uses a log-logistic inlier model 5 and a uniform outlier 6, robust to flow outliers and occlusions.
All derivatives of 7 with respect to Gaussian parameters and pose are provided in closed form, leveraging the eigendecomposition of projected covariances and efficient matrix calculus.
4. Map Densification and Pruning Mechanisms
To mitigate local minima and degeneracies in poorly observed or ambiguous regions, GaussianFlow SLAM incorporates normalized error-based densification and pruning:
- Densification utilizes a scale-invariant per-Gaussian error metric: for each Gaussian 8,
9
where 0 sums error map (e.g., DSSIM) over weights, and 1 is Gaussian silhouette mass.
Gaussians are marked for densification based on combinations of excess error, radius, and gradient magnitude exceeding empirical thresholds.
- Pruning excises Gaussians with low support, high normalized error, or minimal opacity that contribute little to map quality or stability.
This recurrent diagnostic cycle ensures the map remains minimally redundant, well-densified in high-error regions, and structurally stable across frames.
5. Implementation and Performance Characteristics
GaussianFlow SLAM is implemented with Python orchestration and core computational graph—including rasterization, alignment loss, and gradient computation—realized in custom CUDA. Key system parameters:
| Aspect | Details |
|---|---|
| Hardware | AMD Threadripper 5975WX, NVIDIA RTX A6000 |
| Max Gaussians | ~500,000 |
| Rendering speed | ~460 FPS (rasterizer) |
| End-to-end pipeline | ~0.17 FPS |
| Memory Bottleneck | Gaussian + flow tensor storage |
Standard datasets used in benchmarking include EuRoC MAV and TUM RGB-D, with ground truth from Vicon/MoCap or robot logs.
6. Empirical Evaluation and Comparative Analysis
GaussianFlow SLAM demonstrates improved trajectory estimation and rendering accuracy compared to monocular 3DGS-SLAM baselines (MonoGS, MM3DGS-SLAM, Photo-SLAM, HI-SLAM2, WildGS-SLAM):
- On EuRoC, average RMSE ATE: 0.050 m (GaussianFlow SLAM) vs 0.059 m (next best), with GaussianFlow SLAM attaining best or next-best scores in 9/11 sequences and all TUM RGB-D sequences.
- Rendering: EuRoC mean PSNR 25.2 dB (GaussianFlow SLAM) vs 24.3 dB (baseline).
- Qualitative analysis: reconstructed depth maps are sharper, with novel views preserving fine edges and textures; projected Gaussian motions align with learned flow post-optimization.
7. Contributions, Limitations, and Future Scope
Key contributions of GaussianFlow SLAM include the first monocular 3DGS-SLAM to utilize dense optical-flow supervision with closed-form analytic gradients and direct GPU implementation, a flow-coupled optimization pipeline for both map and pose, and the introduction of normalized error-driven map densification/pruning modules.
Current limitations include sub-real-time performance (end-to-end ≈0.17 FPS), reliance on first-order 3DGS optimization (necessitating iterative DBA for pose refinement), and reduced robustness in extreme blur or low-light conditions, affecting flow quality. Promised directions include GPU-based second-order solvers, selective Gaussian updates, and extension to dynamic environments or sensor-fusion with stereo/IMU (Seo et al., 17 Apr 2026).
GaussianFlow SLAM's architecture is also noted as a foundational design for integrating geometry-aware learning signals into 3DGS-based SLAM, serving as an adaptive, tightly coupled monocular SLAM solution with state-of-the-art quantitative performance.