FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network (2211.15069v3)

Published 28 Nov 2022 in cs.CV

Abstract: We introduce a lightweight network to improve descriptors of keypoints within the same image. The network takes the original descriptors and the geometric properties of keypoints as the input, and uses an MLP-based self-boosting stage and a Transformer-based cross-boosting stage to enhance the descriptors. The boosted descriptors can be either real-valued or binary ones. We use the proposed network to boost both hand-crafted (ORB, SIFT) and the state-of-the-art learning-based descriptors (SuperPoint, ALIKE) and evaluate them on image matching, visual localization, and structure-from-motion tasks. The results show that our method significantly improves the performance of each task, particularly in challenging cases such as large illumination changes or repetitive patterns. Our method requires only 3.2ms on desktop GPU and 27ms on embedded GPU to process 2000 features, which is fast enough to be applied to a practical system. The code and trained weights are publicly available at github.com/SJTU-ViSYS/FeatureBooster.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces FeatureBooster, which leverages a two-stage neural network (MLP and Transformer) to enhance image feature descriptors.
It demonstrates improved matching accuracy, localization performance, and 3D reconstruction efficiency across multiple datasets.
The lightweight, plug-and-play architecture streamlines integration into existing computer vision pipelines without significant computational overhead.

An Overview of FeatureBooster: Enhancing Feature Descriptors with Neural Networks

The paper "FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network" presents a novel approach to improving the effectiveness of feature descriptors in computer vision tasks. The proposed method, FeatureBooster, utilizes a lightweight neural network to enhance feature descriptors, which are pivotal in structure from motion (SfM), visual localization, and simultaneous localization and mapping (SLAM).

Methodology

FeatureBooster is engineered to refine the descriptors of keypoints within images by leveraging both their original descriptors and geometric properties. It adopts a hybrid architecture comprising an MLP-based self-boosting stage, which projects original descriptors into a new space while encoding geometric information, and a Transformer-based cross-boosting stage that incorporates global context.

Self-Boosting Stage: This stage involves using an MLP to improve the initial descriptors by projecting them into a more suitable domain for matching, thus enhancing their discriminability using geometric characteristics such as 2D location, scale, and orientation.
Cross-Boosting Stage: Building on recent advances in attention mechanisms, the cross-boosting stage employs a Transformer network to refine descriptors further by considering the global context derived from the spatial arrangement and the descriptors of all features within the image. The cross-boosting stage's attention-free nature makes the process computationally efficient.

The method caters to both binary and real-valued descriptors, maintaining versatility across different types of input features. It is reported that the approach takes merely 3.2 ms on a desktop GPU and 27 ms on an embedded GPU to process 2000 features, highlighting its efficiency.

Results and Evaluation

The effectiveness of FeatureBooster is validated across various datasets and computer vision tasks, such as:

Image Matching: Extensive testing on the HPatches dataset demonstrates a notable improvement in mean matching accuracy (MMA) for both hand-crafted and learning-based descriptors, with the enhanced SIFT descriptors outperforming even advanced learning-based alternatives like SOSNet.
Visual Localization: Robust performance was observed in challenging scenarios, including the Aachen Day-Night and InLoc datasets, where the proposed method excelled in daytime and nighttime retrieval accuracy. Interestingly, the boosted binary ORB descriptors showed competitive performance when compared to more sophisticated state-of-the-art descriptors.
Structure-from-Motion: FeatureBooster was shown to enhance the number of registered images and reconstructed points in SfM tasks, indicating improved robustness and completeness of 3D reconstruction.

Implications and Future Work

FeatureBooster's lightweight and plug-and-play nature makes it a promising tool for enhancing existing visions systems without extensive architectural changes or additional computational burdens. Its ability to improve upon both traditional and contemporary descriptors suggests broad applicability, whether in optimizing current solutions or augmenting low-power applications.

A potential avenue for future research could involve exploring the integration capabilities of FeatureBooster with more diverse imaging platforms and scenarios. Moreover, further analysis on its applicability to dense features could broaden its range of use, although its scalability with the number of keypoints remains a consideration.

In summary, FeatureBooster marks a significant contribution to enhancing feature descriptor accuracy and efficiency through the innovative use of neural networks, bridging the gap between computational feasibility and powerful image matching capabilities.

PDF Markdown

Related Papers

GitHub

GitHub - SJTU-ViSYS/FeatureBooster: FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network (CVPR 2023) (301 stars)