GaussianFlow SLAM: Monocular 3D Mapping

Updated 21 April 2026

GaussianFlow SLAM is a monocular SLAM framework that employs anisotropic 3D Gaussian splatting combined with dense optical flow supervision to resolve depth ambiguities.
It utilizes analytic flow alignment losses and a recurrent, flow-guided bundle adjustment to jointly optimize camera pose and scene mapping.
The approach achieves state-of-the-art trajectory estimation and rendering accuracy, demonstrating robust performance on standard datasets despite challenging imaging conditions.

GaussianFlow SLAM is a monocular simultaneous localization and mapping (SLAM) framework that employs 3D Gaussian splatting as its scene representation, with dense optical flow supervision—termed "GaussianFlow"—incorporated into both geometric map optimization and camera pose estimation. This approach addresses the inherent ambiguities of monocular input by providing consistent geometry-aware cues derived from optical flow alignment between frames, enabling state-of-the-art tracking fidelity and photorealistic scene rendering. GaussianFlow SLAM introduces analytic flow alignment losses, closed-form gradients, flow-guided bundle adjustment, and data-driven map densification and pruning modules, establishing a tightly coupled, recurrent optimization architecture for robust monocular SLAM (Seo et al., 17 Apr 2026).

1. Scene Representation and GaussianFlow Definition

The scene is modeled as a collection of anisotropic 3D Gaussians, each specified by position $\mathbf x_i\in\mathbb R^3$ , covariance $\Sigma_i\succ 0$ , RGB color $\mathbf c_i\in\mathbb R^3$ , and opacity $o_i\in[0,1]$ . Camera views $\mathbf T_t\in SE(3)$ project these Gaussians onto the image plane, where a tile-based rasterizer computes per-pixel color and geometry using blending weights.

For each image pixel $\mathbf p^j_t$ in frame $I_t$ , the set of contributing Gaussians $\mathcal G^j_t$ is determined. The projected 2D mean and covariance $(\mu_{i,t},\Sigma'_{i,t})$ are derived for each $\mathbf G_i$ . The GaussianFlow vector, denoting the predicted pixel-wise projection of underlying 3D scene motion induced by map and pose changes, is given by

$\Sigma_i\succ 0$ 0

where $\Sigma_i\succ 0$ 1 is the Cholesky factor of the projected covariance, $\Sigma_i\succ 0$ 2 is the 2D offset, and $\Sigma_i\succ 0$ 3 the compositing weight.

2. Flow-Guided Optimization Framework

GaussianFlow SLAM alternates two tightly coupled, recurrently updated optimization modules:

Tracking: For new keyframes, pose estimation minimizes a composite objective

$\Sigma_i\succ 0$ 4

where $\Sigma_i\succ 0$ 5 is the 3DGS photometric+DSSIM error and $\Sigma_i\succ 0$ 6 is the flow alignment loss.

Mapping: Fixing recent poses, the mapping step minimizes

$\Sigma_i\succ 0$ 7

The window $\Sigma_i\succ 0$ 8 indexes recent and strategically selected keyframes. $\Sigma_i\succ 0$ 9 regularizes Gaussian shape and $\mathbf c_i\in\mathbb R^3$ 0 encourages map compactness.

A tightly integrated flow-guided bundle adjustment (DBA) jointly refines the pose graph, using flow-induced correspondences as constraints. All loss gradients—including those for flow alignment—are computed analytically in the fused rasterizer kernel to facilitate GPU-scale backpropagation.

3. Flow Alignment Loss and Analytic Gradients

The core supervisory signal is the flow alignment loss, which encourages the projected motion of each Gaussian to match the learned dense optical flow $\mathbf c_i\in\mathbb R^3$ 1 (estimated via a pretrained ConvGRU RAFT-like network):

$\mathbf c_i\in\mathbb R^3$ 2

with per-pixel negative log-likelihood loss

$\mathbf c_i\in\mathbb R^3$ 3

The distribution $\mathbf c_i\in\mathbb R^3$ 4 uses a log-logistic inlier model $\mathbf c_i\in\mathbb R^3$ 5 and a uniform outlier $\mathbf c_i\in\mathbb R^3$ 6, robust to flow outliers and occlusions.

All derivatives of $\mathbf c_i\in\mathbb R^3$ 7 with respect to Gaussian parameters and pose are provided in closed form, leveraging the eigendecomposition of projected covariances and efficient matrix calculus.

4. Map Densification and Pruning Mechanisms

To mitigate local minima and degeneracies in poorly observed or ambiguous regions, GaussianFlow SLAM incorporates normalized error-based densification and pruning:

Densification utilizes a scale-invariant per-Gaussian error metric: for each Gaussian $\mathbf c_i\in\mathbb R^3$ 8,

$\mathbf c_i\in\mathbb R^3$ 9

where $o_i\in[0,1]$ 0 sums error map (e.g., DSSIM) over weights, and $o_i\in[0,1]$ 1 is Gaussian silhouette mass.

Gaussians are marked for densification based on combinations of excess error, radius, and gradient magnitude exceeding empirical thresholds.

Pruning excises Gaussians with low support, high normalized error, or minimal opacity that contribute little to map quality or stability.

This recurrent diagnostic cycle ensures the map remains minimally redundant, well-densified in high-error regions, and structurally stable across frames.

5. Implementation and Performance Characteristics

GaussianFlow SLAM is implemented with Python orchestration and core computational graph—including rasterization, alignment loss, and gradient computation—realized in custom CUDA. Key system parameters:

Aspect	Details
Hardware	AMD Threadripper 5975WX, NVIDIA RTX A6000
Max Gaussians	~500,000
Rendering speed	~460 FPS (rasterizer)
End-to-end pipeline	~0.17 FPS
Memory Bottleneck	Gaussian + flow tensor storage

Standard datasets used in benchmarking include EuRoC MAV and TUM RGB-D, with ground truth from Vicon/MoCap or robot logs.

6. Empirical Evaluation and Comparative Analysis

GaussianFlow SLAM demonstrates improved trajectory estimation and rendering accuracy compared to monocular 3DGS-SLAM baselines (MonoGS, MM3DGS-SLAM, Photo-SLAM, HI-SLAM2, WildGS-SLAM):

On EuRoC, average RMSE ATE: 0.050 m (GaussianFlow SLAM) vs 0.059 m (next best), with GaussianFlow SLAM attaining best or next-best scores in 9/11 sequences and all TUM RGB-D sequences.
Rendering: EuRoC mean PSNR 25.2 dB (GaussianFlow SLAM) vs 24.3 dB (baseline).
Qualitative analysis: reconstructed depth maps are sharper, with novel views preserving fine edges and textures; projected Gaussian motions align with learned flow post-optimization.

7. Contributions, Limitations, and Future Scope

Key contributions of GaussianFlow SLAM include the first monocular 3DGS-SLAM to utilize dense optical-flow supervision with closed-form analytic gradients and direct GPU implementation, a flow-coupled optimization pipeline for both map and pose, and the introduction of normalized error-driven map densification/pruning modules.

Current limitations include sub-real-time performance (end-to-end ≈0.17 FPS), reliance on first-order 3DGS optimization (necessitating iterative DBA for pose refinement), and reduced robustness in extreme blur or low-light conditions, affecting flow quality. Promised directions include GPU-based second-order solvers, selective Gaussian updates, and extension to dynamic environments or sensor-fusion with stereo/IMU (Seo et al., 17 Apr 2026).

GaussianFlow SLAM's architecture is also noted as a foundational design for integrating geometry-aware learning signals into 3DGS-based SLAM, serving as an adaptive, tightly coupled monocular SLAM solution with state-of-the-art quantitative performance.

Markdown Report Issue Upgrade to Chat

References (1)

GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GaussianFlow SLAM.