Papers
Topics
Authors
Recent
Search
2000 character limit reached

GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow

Published 17 Apr 2026 in cs.RO and cs.CV | (2604.15612v1)

Abstract: Gaussian splatting has recently gained traction as a compelling map representation for SLAM systems, enabling dense and photo-realistic scene modeling. However, its application to monocular SLAM remains challenging due to the lack of reliable geometric cues from monocular input. Without geometric supervision, mapping or tracking could fall in local-minima, resulting in structural degeneracies and inaccuracies. To address this challenge, we propose GaussianFlow SLAM, a monocular 3DGS-SLAM that leverages optical flow as a geometry-aware cue to guide the optimization of both the scene structure and camera poses. By encouraging the projected motion of Gaussians, termed GaussianFlow, to align with the optical flow, our method introduces consistent structural cues to regularize both map reconstruction and pose estimation. Furthermore, we introduce normalized error-based densification and pruning modules to refine inactive and unstable Gaussians, thereby contributing to improved map quality and pose accuracy. Experiments conducted on public datasets demonstrate that our method achieves superior rendering quality and tracking accuracy compared with state-of-the-art algorithms. The source code is available at: https://github.com/url-kaist/gaussianflow-slam.

Summary

  • The paper introduces a fully differentiable monocular SLAM pipeline that integrates dense optical flow with Gaussian splatting for robust pose estimation and dense 3D reconstruction.
  • It leverages closed-form analytic gradients and error-based Gaussian management to mitigate geometric ambiguities and improve photorealistic rendering quality.
  • Experimental results on EuRoC and TUM benchmarks demonstrate superior tracking accuracy and high visual fidelity, outperforming competing monocular SLAM methods.

Monocular 3D Gaussian Splatting SLAM with Optical Flow Guidance

Introduction and Motivation

Monocular SLAM methods leveraging differentiable scene representations have garnered significant interest owing to their capacity for dense, photorealistic reconstruction. However, Gaussian Splatting (3DGS)—which provides efficient, differentiable rasterization and competitive rendering fidelity—faces severe geometric ambiguities in monocular configurations due to the absence of robust depth priors. The paper “GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow” (2604.15612) introduces an integrated framework (GaussianFlow SLAM) that directly utilizes dense optical flow as a geometry-aware supervisory signal, thereby regularizing both scene structure and camera trajectory estimation in a tightly coupled SLAM pipeline.

Methodology

Optical Flow–Guided Monocular 3DGS-SLAM

The core of GaussianFlow SLAM is the explicit alignment between the projected 3DGS scene motion (GaussianFlow) and network-predicted optical flow, yielding closed-form analytic gradients amenable to efficient GPU-based optimization. This mutual alignment constrains 3DGS geometry and enhances pose tracking, circumventing common pitfalls such as local minima that arise in the absence of reliable geometric cues.

The operational loop involves:

  • GaussianFlow Computation: For each input monocular image, the projected motion of each Gaussian is analytically related to scene flow, ensuring differentiability with respect to both scene and pose parameters.
  • Loss Formulation: The GaussianFlow loss is defined between rendered flows and network-predicted optical flows, leveraging a log-logistic residual for robust estimation amidst noisy flow predictions.
  • Backpropagation Efficiency: By integrating closed-form gradients for all GaussianFlow terms (mean, covariance, opacity) using eigendecomposition, the loss can be efficiently differentiated within the fused 3DGS CUDA kernel, enabling optimization with respect to both Gaussian parameters and camera extrinsics. Figure 1

    Figure 1: GaussianFlow SLAM leverages optical flow to regularize 3DGS optimization and pose estimation by aligning projected Gaussian motions with image-domain optical flow.

Tracking and Mapping Loop

The framework tightly alternates between:

  • Pose Tracking: Initial poses for new keyframes are estimated by minimizing a joint photometric and GaussianFlow loss over multiple past and present frames.
  • Mapping (3DGS Optimization): Poses are fixed while jointly optimizing the 3DGS map by integrating image, flow, isotropic, and opacity losses over multi-view windows.
  • Dense Bundle Adjustment (DBA): Multi-pose optimization is performed in windows, using GaussianFlow as input for DBA, which is implemented with a ConvGRU-based recurrent flow estimation module.

This cyclical process facilitates geometric feedback between mapping and tracking: continuous 3DGS refinement reduces degenerate or biased trajectories, while improved camera poses enhance spatial consistency of the reconstructed scene.

Error-Based Gaussian Management

Precise Gaussian management is essential to counteract underutilized or unstable components—“floaters”—that can manifest during iterative optimization.

  • Error-Based Densification: Traditional gradient-based split heuristics are replaced with per-Gaussian error normalization (error divided by screen-density contribution), which is sensitive to undersegmentations that persist near local minima. Loss map types include DSSIM and pixelwise flow losses, providing adaptive and consistent selection signals for Gaussian splitting. Figure 2

    Figure 2: Normalized per-Gaussian error analysis enables selective densification and pruning, prioritizing geometrically salient errors over mere coverage.

  • Error-Based Pruning: Gaussians exhibiting both high normalized error and low opacity or coverage radius are pruned.
  • Global Management: Gaussian keyframe assignments and adaptive optimization windows provide scalability with growing map size, ensuring robust global consistency.

Experimental Results

Datasets and Setup

Experiments were conducted on TUM RGB-D and EuRoC benchmarks, evaluating both camera tracking and 3DGS rendering fidelity. The full monocular pipeline was implemented with extensive kernel-level CUDA customizations for backpropagation efficiency.

Tracking and Mapping Results

  • On EuRoC (large-scale UAV and rapid motion), the proposed method consistently outperformed baselines MonoGS, MM3DGS-SLAM, Photo-SLAM, HI-SLAM2, and WildGS-SLAM in terms of absolute trajectory RMSE. Notably, the method showed superior robustness to local minima and erroneous pose trajectories, particularly in the absence or misalignment of monocular depth priors.
  • On smaller-scale TUM, performance remained competitive, often matching or exceeding methods based on feature-based or network-predicted depth priors. Figure 3

    Figure 3: For challenging scenes, GaussianFlow SLAM produces geometrically accurate rasterized depths and visual fidelity, outperforming monocular depth-prior-based methods when depth quality is poor.

  • Rendering quality, quantified by PSNR, SSIM, and LPIPS, favored GaussianFlow SLAM, especially on large and structurally complex sequences. It achieved highest perceptual quality (lowest LPIPS) across most test cases.

Ablation and Runtime

  • Ablation studies demonstrate that optical flow supervision for both pose and geometry updates is critical for accurate large-scale monocular SLAM; removing GaussianFlow guidance degraded trajectory and map quality, especially in EuRoC sequences.
  • Normalized error-based densification/pruning improved both trajectory RMSE and rendering metrics, substantiating its necessity for stable long-horizon optimization.
  • Despite the efficiency of 3DGS rasterization, the tightly coupled optimization loop renders the system non-realtime (typical end-to-end processing at ~0.17 FPS), with compute bottlenecks in multi-view 3DGS parameter updates. Figure 4

    Figure 4: Qualitative reconstructions on EuRoC dataset illustrate the preservation of fine geometry and photometric consistency, particularly in comparison to sparse-feature and depth-prior-based methods.

Theoretical and Practical Implications

This study establishes that integrating dense optical flow with analytic gradient support provides a viable route to closing the gap between high-fidelity geometrically consistent mapping and monocular SLAM pipelines. By leveraging kernel-level closed-form differentiation, the method avoids the memory and computational overheads associated with graph-based autodiff frameworks, enabling scaling to large maps and frequent joint updates.

The demonstrated superiority over depth-prior-guided or feature-tracking pipelines underscores the importance of dense, per-pixel geometric consistency. Practically, the proposed pipeline is suited to static or moderately dynamic environments where high rendering fidelity and robust pose estimation are required from minimal monocular data.

However, scalability to real-time rates remains impeded by the cost of tightly-coupled, first-order optimization; incorporation of efficient, second-order methods tailored for 3DGS parameter updates may address this limitation in future work.

Conclusion

GaussianFlow SLAM introduces a robust, fully differentiable monocular SLAM method for dense scene reconstruction, establishing dense optical flow as a geometric supervisory signal for both state estimation and map optimization. It supports efficient, closed-form kernel-level differentiation for scalability, integrates error-driven Gaussian management, and attains state-of-the-art tracking and rendering performance on public datasets. Future research may explore dynamic scene modeling and further optimization acceleration to enable broader deployment in real-time applications.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 32 likes about this paper.