MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing (2412.20082v2)

Published 28 Dec 2024 in cs.CV

Abstract: Deep visual odometry has demonstrated great advancements by learning-to-optimize technology. This approach heavily relies on the visual matching across frames. However, ambiguous matching in challenging scenarios leads to significant errors in geometric modeling and bundle adjustment optimization, which undermines the accuracy and robustness of pose estimation. To address this challenge, this paper proposes MambaVO, which conducts robust initialization, Mamba-based sequential matching refinement, and smoothed training to enhance the matching quality and improve the pose estimation. Specifically, the new frame is matched with the closest keyframe in the maintained Point-Frame Graph (PFG) via the semi-dense based Geometric Initialization Module (GIM). Then the initialized PFG is processed by a proposed Geometric Mamba Module (GMM), which exploits the matching features to refine the overall inter-frame matching. The refined PFG is finally processed by differentiable BA to optimize the poses and the map. To deal with the gradient variance, a Trending-Aware Penalty (TAP) is proposed to smooth training and enhance convergence and stability. A loop closure module is finally applied to enable MambaVO++. On public benchmarks, MambaVO and MambaVO++ demonstrate SOTA performance, while ensuring real-time running.

Summary

The paper introduces MambaVO, a deep visual odometry system that enhances accuracy and robustness through novel geometric initialization, sequential matching refinement using a Mamba architecture, and training smoothing.
MambaVO achieves state-of-the-art performance on standard benchmarks, demonstrating notable improvements and a 19-22% error reduction on indoor datasets compared to previous deep VO methods.
The innovations in MambaVO significantly contribute to visual odometry and open avenues for future research in integrated SLAM systems and improved handling of challenging environments.

Overview of MambaVO: Deep Visual Odometry

The paper introduces MambaVO, a system that advances deep visual odometry (VO) through improved sequential matching refinement and training smoothing. The proposed methodology enhances the accuracy and robustness of pose estimation, a critical capability for applications such as autonomous navigation in robots and self-driving cars. The MambaVO framework leverages Mamba-based architecture and introduces several novel modules to address limitations in current state-of-the-art deep VO systems.

Methodology and Innovations

MambaVO addresses three core challenges in deep VO: unstable initialization, insufficient refinement in matching, and the training challenges posed by gradient variance in nested optimization frameworks. The proposed method introduces three critical components:

Geometric Initialization Module (GIM): This module utilizes a semi-dense matching network to provide robust pose initialization through the exploitation of geometric features. It employs a combination of pre-trained models like EfficientLoFTR for geometric features and Dino-v2 for semantic context, yielding precise initial feature correspondences which are further refined using PnP pose estimation processes.
Geometric Mamba Module (GMM): Subsequent to initialization, GMM performs sequential matching refinement via modifications of the Mamba architecture to integrate long-range dependencies. Historical information is leveraged to achieve refined, temporal-aware matching which is crucial for accurate VO. The emphasis on pixel-level correspondence refinement through historical data integration marks a significant methodological improvement.
Trending-Aware Penalty (TAP): This component addresses gradient variance issues by balancing pose and matching losses during training, leading to improved convergence and stability in learning. TAP dynamically weights the loss functions based on historical trends, thus accommodating the inherent challenges in trajectory variance.

The system is later enhanced with a loop closure capability, branded as MambaVO++, designed to reduce cumulative drift through global optimization mechanisms.

Experimental Results

MambaVO and its enhanced version, MambaVO++, have been rigorously evaluated against prominent benchmarks such as EuRoC, TUM-RGBD, KITTI, and TartanAir. The results demonstrate notable improvements in accuracy and robustness, surpassing previous state-of-the-art methods in these domains. MambaVO achieves substantial reduction in pose estimation errors across challenging scenarios including low-texture environments. Noteworthy is the reported 19-22% error reduction on indoor datasets compared to competitors like DROID-VO and DPVO.

Implications and Future Work

The innovations in MambaVO, particularly the integration of Mamba-based architectures and training smoothing strategies, offer significant contributions to the field of visual odometry. These advancements provide pathways for future explorations into extending these mechanisms within broader SLAM systems and potentially integrating dense reconstruction capabilities using modern methods such as 3D Gaussian Splatting. Improved handling of large-scale environments and further minimization of computational overheads remain as promising avenues for subsequent research.

In summary, MambaVO sets a new benchmark for deep visual odometry systems by adeptly handling matching refinement and optimization challenges, thus improving both theoretical understanding and practical performance of VO systems.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (8)

Tweets

https://twitter.com/zhenjun_zhao/status/1873949594234892610