MambaGlue: Fast and Robust Local Feature Matching With Mamba (2502.00462v1)

Published 1 Feb 2025 in cs.CV and cs.RO

Abstract: In recent years, robust matching methods using deep learning-based approaches have been actively studied and improved in computer vision tasks. However, there remains a persistent demand for both robust and fast matching techniques. To address this, we propose a novel Mamba-based local feature matching approach, called MambaGlue, where Mamba is an emerging state-of-the-art architecture rapidly gaining recognition for its superior speed in both training and inference, and promising performance compared with Transformer architectures. In particular, we propose two modules: a) MambaAttention mixer to simultaneously and selectively understand the local and global context through the Mamba-based self-attention structure and b) deep confidence score regressor, which is a multi-layer perceptron (MLP)-based architecture that evaluates a score indicating how confidently matching predictions correspond to the ground-truth correspondences. Consequently, our MambaGlue achieves a balance between robustness and efficiency in real-world applications. As verified on various public datasets, we demonstrate that our MambaGlue yields a substantial performance improvement over baseline approaches while maintaining fast inference speed. Our code will be available on https://github.com/url-kaist/MambaGlue

Summary

The paper presents a hybrid Mamba-Transformer framework that achieves fast, robust local feature matching through innovative attention and confidence modules.
The paper employs a MambaAttention Mixer to selectively fuse local and global features, significantly enhancing the capture of long-range dependencies.
The paper introduces a Deep Confidence Score Regressor using a multi-layer perceptron with an early-exit strategy, reducing inference time while maintaining high accuracy.

Exploring MambaGlue: A Fast and Robust Local Feature Matching Framework

The paper introduces "MambaGlue," a novel framework for local feature matching in computer vision applications, particularly emphasizing achieving a balance between speed and robustness. The paper proposes integrating the Mamba architecture with Transformer-based methods to address ongoing challenges in feature matching tasks. Mamba, recognized for its efficient handling of sequential data with lower computational complexity, combines with established Transformer techniques to create a hybrid approach that enhances both training and inference processes.

Methodological Contributions

MambaGlue introduces two significant modules to improve feature matching performance:

MambaAttention Mixer: This module utilizes a combination of the Mamba architecture and self-attention mechanisms. The key innovation lies in its ability to simultaneously and selectively attend to local and global feature contexts, effectively integrating local self-attention paths with Mamba's selective attention mechanism. This approach allows for capturing longer-range dependencies more efficiently than traditional Transformer-based methods.
Deep Confidence Score Regressor: Replacing the simpler single-layer confidence module used in existing solutions like LightGlue, MambaGlue incorporates a multi-layer perceptron (MLP) architecture for evaluating the confidence of match predictions. This multi-layer structure enables an improved hierarchical understanding of features, potentially refining the matching process by better distinguishing significant correspondences.

The framework employs an iterative early-exit strategy, leveraging these confidence scores to halt processing when a sufficiently accurate result is achieved, further improving processing speed without sacrificing accuracy.

Evaluation and Comparisons

The effectiveness of MambaGlue is rigorously tested against various benchmarks:

Homography Estimation: Evaluated using the HPatches dataset, MambaGlue displays superior performance, showcasing higher precision and area under the curve (AUC) scores for geometric estimations when comparing robust and non-robust methods (such as LO-RANSAC and DLT). Among sparse matchers utilizing SuperPoint features, MambaGlue notably improves accuracy beyond existing solutions like SuperGlue and LightGlue, demonstrating competitive performance even against dense matchers.
Relative Pose Estimation: On MegaDepth1500, MambaGlue generally outperforms other sparse methods in estimating essential matrices, achieving higher AUC results at multiple angular error thresholds. One notable aspect is MambaGlue's ability to maintain high accuracy with reduced inference times when utilizing an early-stopping mechanism through its exit test configuration.
Visual Localization: In outdoor localization tasks on the Aachen Day-Night dataset, MambaGlue maintains high recall rates and mapping accuracy compared to baseline methods, further asserting its capabilities in robust real-world applications.

Implications and Future Outlook

The integration of Mamba with Transformer architectures opens pathways for achieving enhanced operational efficiencies in feature matching tasks. The hybrid approach contributes to the discussions on the adaptability and scalability of Mamba in vision tasks beyond language processing, suggesting substantial potential for broader applications in autonomous systems where speed and precision are paramount.

Future directions could explore transitioning to Mamba-only models to leverage full computational benefits, refining the framework for further reductions in latency without compromising accuracy. Additionally, efforts to extend this architecture could focus on adapting and evaluating across diversified datasets, extending applicability beyond conventional benchmarks to innovative domains in image processing and beyond.

The research presented encapsulates a meaningful stride in the domain of computer vision, providing a balanced framework that significantly advances the state of feature matching, blending robustness, and velocity in a comprehensive package.

PDF Markdown

GitHub

GitHub - url-kaist/MambaGlue: MambaGlue: Fast and Robust Local Feature Matching With Mamba @ ICRA'25 (52 stars)