Unblur-SLAM: Dense Neural SLAM for Blurry Inputs

Published 26 Mar 2026 in cs.CV and eess.IV | (2603.26810v1)

Abstract: We propose Unblur-SLAM, a novel RGB SLAM pipeline for sharp 3D reconstruction from blurred image inputs. In contrast to previous work, our approach is able to handle different types of blur and demonstrates state-of-the-art performance in the presence of both motion blur and defocus blur. Moreover, we adjust the computation effort with the amount of blur in the input image. As a first stage, our method uses a feed-forward image deblurring model for which we propose a suitable training scheme that can improve both tracking and mapping modules. Frames that are successfully deblurred by the feed-forward network obtain refined poses and depth through local-global multi-view optimization and loop closure. Frames that fail the first stage deblurring are directly modeled through the global 3DGS representation and an additional blur network to model multiple blurred sub-frames and simulate the blur formation process in 3D space, thereby learning sharp details and refined sub-frame poses. Experiments on several real-world datasets demonstrate consistent improvements in both pose estimation and sharp reconstruction results of geometry and texture.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces a robust SLAM framework that leverages physics-constrained deblurring and blur-aware tracking to achieve high-fidelity 3D reconstructions.
Key innovations include adaptive blur classification, multi-scale per-pixel kernel modeling, and 3D Gaussian Splatting with hybrid bundle adjustment for consistent mapping.
Quantitative results on benchmarks like ArchViz and TUM-RGBD demonstrate substantial improvements in PSNR (27.45 dB) and trajectory accuracy over state-of-the-art methods.

Unblur-SLAM: Dense Neural SLAM for Blurry Inputs

Introduction and Motivation

Unblur-SLAM addresses the long-standing challenge in dense monocular RGB SLAM of robustly reconstructing geometrically accurate and photorealistic 3D scenes from input sequences with severe image blur. While classical SLAM methods catastrophically degrade under defocus or motion-blurred inputs due to their strong dependency on reliable image features, recent dense 3D neural scene representations—particularly frameworks leveraging 3D Gaussian Splatting (3DGS)—offer a higher degree of flexibility. However, prior 3DGS SLAM pipelines either ignore blur or treat all frames as equally degraded, resulting in sub-optimal trade-offs between performance and computational efficiency. Unblur-SLAM introduces a principled approach leveraging automatic blur quantification, physics-constrained, 3D-consistent deblurring, and adaptive computation to handle both motion and defocus blur, targeting high-fidelity online scene reconstruction under real-world conditions.

Figure 1: Unblur-SLAM overview; input frames are classified by blur severity, routed to appropriate deblurring and mapping modules, optimizing runtime and reconstruction fidelity.

Methodology Overview

The system architecture comprises several orthogonal innovations:

Blur Classification and Pipeline Adaptation: Input images are automatically classified into sharp, blurry (deblurring successes), and heavily blurred (deblurring failures) via ARNIQA-based detection, allowing the pipeline to conditionally perform the minimal necessary computation. Sharp images are tracked and mapped directly, while blurry ones invoke deblurring networks and, if needed, multi-view 3D blur modeling.
Physics-Constrained Deblurring Network: Unlike previous single-frame deblurring approaches that lack multi-view consistency, Unblur-SLAM imposes blur formation physics during training, explicitly modeling both motion and defocus effects using real and semi-synthetic datasets with accurate ground truth. The network is supervised to extract physically plausible mid-exposure reference images, augmented with a two-phase training strategy (synthetic and real fine-tuning) to maximize consistency and generalizability.
3DGS Representation with Hybrid Bundle Adjustment: The mapping backend utilizes 3D Gaussian Splatting, with scene updates guided by photometric, geometric, and blur-aware losses. For deblurring failures, temporally distributed virtual sub-frames (along physically plausible motion trajectories) are rendered and optimized in 3D, while successful deblurring relies on further residual blur kernel estimation (via BAGS Blur Proposal Network) and photometric consistency.
Adaptive Multi-Scale Blur Modeling: For each blurry frame (with or without successful deblurring), adaptive per-pixel multi-scale kernels are learned for residual blur compensation, parameterized as a function of rendered depth—allowing for depth-aware, physically consistent detail enhancement and restoration.
Global Consistency and Loop Closure: Local and global bundle adjustments—including loop closure detection and joint map deformation—are performed in a sliding-window fashion, regularizing geometric and appearance parameters with sharp frame prioritization.
Figure 2: Deblurring performance comparisons on synthetic and real benchmarks against state-of-the-art offline methods, highlighting significant quantitative advantages.

Quantitative and Qualitative Results

The effectiveness of Unblur-SLAM is substantiated with consistent improvements across both synthetic (ReplicaBlurry, ArchViz) and real-world (TUM-RGBD, IndoorMCD) datasets. For instance:

On extreme-blur, synthetic ArchViz sequences, Unblur-SLAM demonstrates clear improvements in both accuracy and photorealism, as measured by lower Absolute Trajectory Error (ATE) and higher PSNR, over specialized motion-blur-aware baselines (MBA-SLAM).
On the established Deblur-NeRF defocus blur benchmark, Unblur-SLAM achieves a substantial PSNR improvement over previous state-of-the-art (27.45 dB vs. 24.21 dB for Deblurring 3DGS), asserting that its online system outperforms even offline deblurring solutions.
Across TUM RGB-D and MCD datasets, both ATE and PSNR are improved even in sequences with heterogeneous sharp/blurred frame proportions, aided by robust blur detection and adaptive processing.
Qualitative evaluations document sharper detail recovery and texture faithfulness compared to MBA-SLAM and I2-SLAM, especially in sequences with defocus or mixed blur (Figures 3, 4, 5).

Figure 3: Comparison of camera trajectories on IndoorMCD; Unblur-SLAM achieves lower drift and alignment error than Droid-SLAM.

Figure 4: Qualitative comparison on TUM sequences; Unblur-SLAM preserves geometric structure and fine texture beyond MBA-SLAM and Deblur-SLAM outputs.

Analysis and Discussion

Unblur-SLAM’s core claim is that handling real-world SLAM blur requires both explicit and physically motivated regularization in deblurring and plug-in architecture allowing fast bypass of unaffected frames. The proposed hybridization of learning-based and analytic blur modeling avoids the key pitfalls of 2D deblurring methods (loss of 3D consistency, noise overfitting) while preventing the high complexity and inefficiency of fully model-based multi-frame approaches. Ablations confirm the fallback mechanism for severe blur is essential for robustness, with a measurable PSNR drop of 0.56 dB if omitted under extreme conditions.

Notably, Unblur-SLAM demonstrates that a well-trained, physics-constrained deblurring module can substantially improve not only aesthetic realism, but also metric localization and mapping accuracy in practical monocular online SLAM. Depth-aware, per-pixel kernel modeling is critical for detail preservation. Nevertheless, the system introduces runtime and memory trade-offs—going beyond 3 virtual sub-frames for extremely blurred inputs exacerbates memory consumption, implying the need for future research into more scalable 3D blur models.

Figure 5: Comparison with ground truth sharp frames (top), illustrating that the method’s reconstructions nearly recover hidden clean details lost in input blur.

Implications and Future Directions

This framework points toward several theoretical and practical advances:

SLAM systems capable of real-time operation in unconstrained imaging conditions (e.g., handheld or mobile robots in low-light or fast-motion) due to automatic and adaptive computation.
Blur physics constraints in training and multi-view consistency priors may become standard for neural scene representations targeting deployment outside ideal laboratory settings.
There is strong evidence for favoring hybrid, modular design: combining physically interpretable models, learned priors, and dynamic control facilitates both accuracy and efficiency.
To achieve wide applicability, future research must address the remaining bottlenecks—especially optimizing deblurring network inference and scalable memory footprint for heavy multi-frame blur scenarios.

Conclusion

Unblur-SLAM delivers a robust, efficient, and high-fidelity dense neural SLAM solution capable of reconstructing sharp, consistent 3D models from image sequences exhibiting significant motion and defocus blur. Key advancements include adaptive blur-aware tracking, physics-based deblurring training, and 3D-consistent multi-scale blur kernel modeling. The system achieves superior quantitative and qualitative performance against all state-of-the-art offline and online baselines, validating the claim that online SLAM pipelines can outperform even specialized single-task offline methods under challenging real-world blur conditions. Remaining runtime/memory constraints highlight the centrality of future work in scalable neural SLAM and real-time physically-driven image restoration.

Figure 6: Comparison with manually labeled sharp frames from I2-SLAM; Unblur-SLAM approaches the fidelity of ground truth even in heavily blurred scenes.

Figure 7: Side-by-side input (blurred) and deblurred Unblur-SLAM output, indicating effective restoration of detail and color consistency.

Figure 8: Generalization across datasets; reconstructed RGB renderings (right) exhibit fine surface and texture details reconstructed from highly corrupted inputs (left).

Markdown Report Issue