Optical Flow Monocular 3DGS-SLAM

Updated 21 April 2026

The paper introduces an optical flow-driven approach that uses dense flow as a supervisory cue and geometric constraint for monocular SLAM.
It employs 3D Gaussian splatting for efficient map representation, enabling dynamic object masking and robust scene reconstruction.
Experimental results demonstrate enhanced trajectory accuracy and map fidelity in low-texture, dynamic, and non-Lambertian environments.

Optical Flow–Guided Monocular 3DGS-SLAM is a class of Simultaneous Localization and Mapping (SLAM) frameworks that utilize dense optical flow fields to guide both camera trajectory estimation and 3D scene reconstruction from monocular (single-camera) RGB video. By leveraging the correspondence between optical flow—image-plane displacements induced by camera motion—and projected scene structure, these systems address the ill-posedness of monocular mapping, providing crucial geometric cues for robust real-time scene modeling, particularly when using the 3D Gaussian Splatting (3DGS) map representation.

1. Principles of Optical Flow–Guided Monocular Mapping

Optical flow encodes per-pixel image displacements between consecutive frames, corresponding to the underlying 3D motion induced by the camera and dynamic scene elements. In monocular 3DGS-SLAM, optical flow is exploited as an additional geometric constraint to regularize both structure-from-motion and dense scene mapping in the absence of active depth sensing. The approach can be categorized as follows:

Optical flow as a supervisory cue: Synthetic optical flow is generated from single frames with depth maps and sampled poses to train deep visual odometry (VO) models (Slinko et al., 2019).
Flow integration for 3DGS optimization: The alignment between the projected motion of 3D Gaussians (termed “GaussianFlow”) and observed optical flow is used as a differentiable loss function for camera pose and scene structure optimization (Seo et al., 17 Apr 2026, Wu et al., 26 Jun 2025).
Masking for dynamic environments: Optical flow helps segment static and dynamic regions, enabling robust tracking and map-cleaning in dynamic scenes (Li et al., 6 Jun 2025).

In monocular 3DGS-SLAM, the map is represented as a set of parameterized 3D Gaussians, with per-Gaussian position, shape, color, and opacity. Camera frames are rendered using differentiable alpha compositing of the projected Gaussians.

2. Map Representation and Rendering with Gaussian Splatting

3D Gaussian Splatting (3DGS) encodes the environment as a collection of ellipsoidal Gaussians $\mathcal{G} = \{\mathbf{G}_i\}_{i=1}^N$ with parameters $(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ :

$\mathbf{x}_i \in \mathbb{R}^3$ : center of the $i$ th Gaussian in world coordinates
$\boldsymbol{\Sigma}_i \in \mathbb{R}^{3 \times 3}$ : full covariance matrix
$o_i$ : opacity
$\mathbf{c}_i \in \mathbb{R}^3$ : RGB color vector

At time $t$ , each Gaussian is transformed to camera coordinates by $\mathbf{T}_t \in SE(3)$ , projected to the image using camera intrinsics $\mathbf{K}$ , and represented as a 2D elliptical “splat.” For each image pixel $(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 0, the final color and depth are obtained via a weighted compositional sum over contributing Gaussians; the weights depend on each Gaussian's projected parameters and the opacity blending (Seo et al., 17 Apr 2026, Wu et al., 26 Jun 2025).

3. Optical Flow–Based Geometric Constraints

Optical flow–guided SLAM utilizes the following geometric alignment loss, applied at each pixel:

GaussianFlow (projected Gaussian motion):

$(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 1

where $(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 2 is the blending weight, $(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 3 is the square root of the projected 2D covariance, and $(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 4 is the projected center.

Flow alignment loss:

$(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 5

where $(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 6 is the observed optical flow at pixel $(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 7 (Seo et al., 17 Apr 2026, Wu et al., 26 Jun 2025).

This loss encourages the projected displacement of Gaussians between frames to match the observed dense optical flow, providing a strong geometric signal for the joint optimization of camera pose and scene structure.

In dynamic environments, pixel-level flow masks are incorporated to isolate static regions, and Bayesian fusion with monocular depth lets the system probabilistically segment dynamic pixels, which are suppressed during pose and map optimization (Li et al., 6 Jun 2025).

4. SLAM Pipeline: Tracking, Mapping, and Global Optimization

The canonical workflow for optical flow–guided monocular 3DGS-SLAM involves:

Initialization: The first frame (with monocular depth prior) initializes the 3DGS model and pose.
Tracking: For each new frame, dense optical flow is computed (e.g., using PWC-Net, GMFlow), and a joint optimization problem is solved over the current pose (and possibly local 3DGS parameters) using a weighted sum of photometric, depth, scale-invariant, and flow losses:

$(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 8

Keyframing and Local BA: Keyframes are selected based on tracking quality or geometry; local bundle adjustment optimizes poses and Gaussians over a window of keyframes (Seo et al., 17 Apr 2026, Wu et al., 26 Jun 2025).
Global Refinement: After mapping, two-stage refinement—priority to keyframes with highest rendering error, then random sampling to cover under-observed views—improves the global consistency (Wu et al., 26 Jun 2025).
Densification and Pruning: Gaussians are split or pruned based on normalized per-view error metrics, focusing map complexity on regions of high photometric or flow residual (Seo et al., 17 Apr 2026).

SLAM backbone frameworks (e.g., g2o for pose-graph optimization (Slinko et al., 2019), or windowed Gauss–Newton BA in 3DGS) are used to jointly optimize all parameters, taking advantage of per-edge or per-pixel uncertainty when available (Ng et al., 2021).

5. Dynamic and Non-Lambertian Scenes: Flow Guidance, Masking, and Robustness

Optical flow enables increased robustness to challenging conditions encountered in monocular SLAM:

Handling Non-Lambertian Surfaces: In endoscopic and surgical domains, appearance changes due to specularities cause pure photometric losses to drift; flow alignment introduces a geometry-only constraint, immune to such effects (Wu et al., 26 Jun 2025).
Dynamic Environments: Combining optical flow–based motion segmentation with per-pixel depth priors enables accurate static/dynamic separation. Bayesian fusion yields a fused mask $(\mathbf{x}_i, \boldsymbol{\Sigma}_i, o_i, \mathbf{c}_i)$ 9, enabling masked pose and mapping optimization with depth- and flow-weighted penalties for dynamic regions (Li et al., 6 Jun 2025).
Occlusion Awareness: Self-supervised learning of occlusion-aware flow increases reliability, with per-pixel cross-weighting of rigid-predicted and flow-predicted losses, leading to improved depth, ego-motion, and robust SLAM under dynamic and occluded settings (Fang et al., 2021).

6. Experimental Results and Comparative Performance

Empirical validation demonstrates the efficacy of optical flow–guided monocular 3DGS-SLAM:

System	Scene Type	ATE / RMSE (Pose)	Rendering (PSNR/SSIM)	Datasets
GaussianFlow SLAM (Seo et al., 17 Apr 2026)	Indoor UAV, rooms	0.013–0.05 m (ATE)	24.56 dB / 0.871	TUM, EuRoC
EndoFlow-SLAM (Wu et al., 26 Jun 2025)	Endoscopy (dynamic)	0.23 mm / 15.47 mm (ATE)	25.18 / 0.82 (static)	C3VD, StereoMIS
Dy3DGS-SLAM (Li et al., 6 Jun 2025)	Indoor dynamic	4.5–4.7 cm (ATE RMSE)	(visual)	BONN, TUM
Deep Flow-VO/SLAM (Slinko et al., 2019)	Driving, MAV	3.37% transl. err. (SLAM)	–	KITTI, EuRoC

All systems show significant improvements in trajectory accuracy and map fidelity over appearance-only or feature-based approaches, with particular gains in dynamic, low-texture, and non-Lambertian scenarios.

7. Methodological Advances and Limitations

Key advances enabled by optical flow–guided monocular 3DGS-SLAM are:

Geometric regularization without active depth: Flow constraints provide geometric cues for scale and structure recovery, essential for monocular pipelines (Seo et al., 17 Apr 2026, Slinko et al., 2019).
Uncertainty modeling: Dense per-pixel uncertainty (via cost-volume fitting) is propagated from front-end flow to Mahalanobis-weighted pose estimation and pose-graph optimization, increasing robustness in low-texture or ambiguous regions (Ng et al., 2021).
Dynamic-aware mapping: Dynamic pixel masking (using flow and depth priors) enables robust mapping and tracking in environments with moving objects, suppressing transient map elements (Li et al., 6 Jun 2025).
Computational trade-offs: While flow-guided optimization closes core degeneracies of photometric or sparse-feature monocular SLAM, it incurs significant computational cost due to dense flow estimation and large, differentiable map representations; real-time performance remains a challenge for full 3DGS pipelines (Seo et al., 17 Apr 2026, Wu et al., 26 Jun 2025).

A plausible implication is that future directions will address the integration of second-order solvers, efficient flow computation, and support for explicit dynamic scene decomposition, further improving real-time capabilities and robustness.

References:

"GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow" (Seo et al., 17 Apr 2026)
"EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting" (Wu et al., 26 Jun 2025)
"Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments" (Li et al., 6 Jun 2025)
"Training Deep SLAM on Single Frames" (Slinko et al., 2019)
"Uncertainty Estimation of Dense Optical-Flow for Robust Visual Navigation" (Ng et al., 2021)
"Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry Perception with Adaptive Cross Weighted Loss from Monocular Videos" (Fang et al., 2021)
"Learning monocular visual odometry with dense 3D mapping from dense 3D flow" (Zhao et al., 2018)