Papers
Topics
Authors
Recent
Search
2000 character limit reached

GaussianFlow SLAM: Monocular 3D Mapping

Updated 21 April 2026
  • GaussianFlow SLAM is a monocular SLAM framework that employs anisotropic 3D Gaussian splatting combined with dense optical flow supervision to resolve depth ambiguities.
  • It utilizes analytic flow alignment losses and a recurrent, flow-guided bundle adjustment to jointly optimize camera pose and scene mapping.
  • The approach achieves state-of-the-art trajectory estimation and rendering accuracy, demonstrating robust performance on standard datasets despite challenging imaging conditions.

GaussianFlow SLAM is a monocular simultaneous localization and mapping (SLAM) framework that employs 3D Gaussian splatting as its scene representation, with dense optical flow supervision—termed "GaussianFlow"—incorporated into both geometric map optimization and camera pose estimation. This approach addresses the inherent ambiguities of monocular input by providing consistent geometry-aware cues derived from optical flow alignment between frames, enabling state-of-the-art tracking fidelity and photorealistic scene rendering. GaussianFlow SLAM introduces analytic flow alignment losses, closed-form gradients, flow-guided bundle adjustment, and data-driven map densification and pruning modules, establishing a tightly coupled, recurrent optimization architecture for robust monocular SLAM (Seo et al., 17 Apr 2026).

1. Scene Representation and GaussianFlow Definition

The scene is modeled as a collection of anisotropic 3D Gaussians, each specified by position xi∈R3\mathbf x_i\in\mathbb R^3, covariance Σi≻0\Sigma_i\succ 0, RGB color ci∈R3\mathbf c_i\in\mathbb R^3, and opacity oi∈[0,1]o_i\in[0,1]. Camera views Tt∈SE(3)\mathbf T_t\in SE(3) project these Gaussians onto the image plane, where a tile-based rasterizer computes per-pixel color and geometry using blending weights.

For each image pixel ptj\mathbf p^j_t in frame ItI_t, the set of contributing Gaussians Gtj\mathcal G^j_t is determined. The projected 2D mean and covariance (μi,t,Σi,t′)(\mu_{i,t},\Sigma'_{i,t}) are derived for each Gi\mathbf G_i. The GaussianFlow vector, denoting the predicted pixel-wise projection of underlying 3D scene motion induced by map and pose changes, is given by

Σi≻0\Sigma_i\succ 00

where Σi≻0\Sigma_i\succ 01 is the Cholesky factor of the projected covariance, Σi≻0\Sigma_i\succ 02 is the 2D offset, and Σi≻0\Sigma_i\succ 03 the compositing weight.

2. Flow-Guided Optimization Framework

GaussianFlow SLAM alternates two tightly coupled, recurrently updated optimization modules:

  • Tracking: For new keyframes, pose estimation minimizes a composite objective

Σi≻0\Sigma_i\succ 04

where Σi≻0\Sigma_i\succ 05 is the 3DGS photometric+DSSIM error and Σi≻0\Sigma_i\succ 06 is the flow alignment loss.

  • Mapping: Fixing recent poses, the mapping step minimizes

Σi≻0\Sigma_i\succ 07

The window Σi≻0\Sigma_i\succ 08 indexes recent and strategically selected keyframes. Σi≻0\Sigma_i\succ 09 regularizes Gaussian shape and ci∈R3\mathbf c_i\in\mathbb R^30 encourages map compactness.

A tightly integrated flow-guided bundle adjustment (DBA) jointly refines the pose graph, using flow-induced correspondences as constraints. All loss gradients—including those for flow alignment—are computed analytically in the fused rasterizer kernel to facilitate GPU-scale backpropagation.

3. Flow Alignment Loss and Analytic Gradients

The core supervisory signal is the flow alignment loss, which encourages the projected motion of each Gaussian to match the learned dense optical flow ci∈R3\mathbf c_i\in\mathbb R^31 (estimated via a pretrained ConvGRU RAFT-like network):

ci∈R3\mathbf c_i\in\mathbb R^32

with per-pixel negative log-likelihood loss

ci∈R3\mathbf c_i\in\mathbb R^33

The distribution ci∈R3\mathbf c_i\in\mathbb R^34 uses a log-logistic inlier model ci∈R3\mathbf c_i\in\mathbb R^35 and a uniform outlier ci∈R3\mathbf c_i\in\mathbb R^36, robust to flow outliers and occlusions.

All derivatives of ci∈R3\mathbf c_i\in\mathbb R^37 with respect to Gaussian parameters and pose are provided in closed form, leveraging the eigendecomposition of projected covariances and efficient matrix calculus.

4. Map Densification and Pruning Mechanisms

To mitigate local minima and degeneracies in poorly observed or ambiguous regions, GaussianFlow SLAM incorporates normalized error-based densification and pruning:

  • Densification utilizes a scale-invariant per-Gaussian error metric: for each Gaussian ci∈R3\mathbf c_i\in\mathbb R^38,

ci∈R3\mathbf c_i\in\mathbb R^39

where oi∈[0,1]o_i\in[0,1]0 sums error map (e.g., DSSIM) over weights, and oi∈[0,1]o_i\in[0,1]1 is Gaussian silhouette mass.

Gaussians are marked for densification based on combinations of excess error, radius, and gradient magnitude exceeding empirical thresholds.

  • Pruning excises Gaussians with low support, high normalized error, or minimal opacity that contribute little to map quality or stability.

This recurrent diagnostic cycle ensures the map remains minimally redundant, well-densified in high-error regions, and structurally stable across frames.

5. Implementation and Performance Characteristics

GaussianFlow SLAM is implemented with Python orchestration and core computational graph—including rasterization, alignment loss, and gradient computation—realized in custom CUDA. Key system parameters:

Aspect Details
Hardware AMD Threadripper 5975WX, NVIDIA RTX A6000
Max Gaussians ~500,000
Rendering speed ~460 FPS (rasterizer)
End-to-end pipeline ~0.17 FPS
Memory Bottleneck Gaussian + flow tensor storage

Standard datasets used in benchmarking include EuRoC MAV and TUM RGB-D, with ground truth from Vicon/MoCap or robot logs.

6. Empirical Evaluation and Comparative Analysis

GaussianFlow SLAM demonstrates improved trajectory estimation and rendering accuracy compared to monocular 3DGS-SLAM baselines (MonoGS, MM3DGS-SLAM, Photo-SLAM, HI-SLAM2, WildGS-SLAM):

  • On EuRoC, average RMSE ATE: 0.050 m (GaussianFlow SLAM) vs 0.059 m (next best), with GaussianFlow SLAM attaining best or next-best scores in 9/11 sequences and all TUM RGB-D sequences.
  • Rendering: EuRoC mean PSNR 25.2 dB (GaussianFlow SLAM) vs 24.3 dB (baseline).
  • Qualitative analysis: reconstructed depth maps are sharper, with novel views preserving fine edges and textures; projected Gaussian motions align with learned flow post-optimization.

7. Contributions, Limitations, and Future Scope

Key contributions of GaussianFlow SLAM include the first monocular 3DGS-SLAM to utilize dense optical-flow supervision with closed-form analytic gradients and direct GPU implementation, a flow-coupled optimization pipeline for both map and pose, and the introduction of normalized error-driven map densification/pruning modules.

Current limitations include sub-real-time performance (end-to-end ≈0.17 FPS), reliance on first-order 3DGS optimization (necessitating iterative DBA for pose refinement), and reduced robustness in extreme blur or low-light conditions, affecting flow quality. Promised directions include GPU-based second-order solvers, selective Gaussian updates, and extension to dynamic environments or sensor-fusion with stereo/IMU (Seo et al., 17 Apr 2026).

GaussianFlow SLAM's architecture is also noted as a foundational design for integrating geometry-aware learning signals into 3DGS-based SLAM, serving as an adaptive, tightly coupled monocular SLAM solution with state-of-the-art quantitative performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GaussianFlow SLAM.