MD-Net: Multi-Detector for Local Feature Extraction (2208.05350v1)

Published 10 Aug 2022 in cs.CV

Abstract: Establishing a sparse set of keypoint correspon dences between images is a fundamental task in many computer vision pipelines. Often, this translates into a computationally expensive nearest neighbor search, where every keypoint descriptor at one image must be compared with all the descriptors at the others. In order to lower the computational cost of the matching phase, we propose a deep feature extraction network capable of detecting a predefined number of complementary sets of keypoints at each image. Since only the descriptors within the same set need to be compared across the different images, the matching phase computational complexity decreases with the number of sets. We train our network to predict the keypoints and compute the corresponding descriptors jointly. In particular, in order to learn complementary sets of keypoints, we introduce a novel unsupervised loss which penalizes intersections among the different sets. Additionally, we propose a novel descriptor-based weighting scheme meant to penalize the detection of keypoints with non-discriminative descriptors. With extensive experiments we show that our feature extraction network, trained only on synthetically warped images and in a fully unsupervised manner, achieves competitive results on 3D reconstruction and re-localization tasks at a reduced matching complexity.

Citations (5)

View on Semantic Scholar

Summary

The paper introduces a novel multi-detector architecture that generates complementary local feature sets to reduce matching complexity.
It employs a fully convolutional backbone with dilated convolutions and an unsupervised loss to enhance keypoint discrimination.
Evaluation on benchmarks shows competitive accuracy and reduced computational cost, balancing efficiency and performance with dual detectors.

Overview of MD-Net: Multi-Detector for Local Feature Extraction

The paper introduces MD-Net, a deep learning architecture designed to optimize the extraction of local features in computer vision applications. This approach addresses the computational burden of matching keypoints across multiple images, a critical task in various vision pipelines such as Structure from Motion (SfM), SLAM, and visual localization. The key innovation of MD-Net is its ability to detect multiple complementary sets of keypoints within an image, thereby significantly reducing the computational complexity of subsequent matching phases.

Methodology

MD-Net's core contribution lies in its multi-detector architecture, which allows the generation of several distinct sets of keypoints per image. Only descriptors within the same set are compared across images, reducing matching complexity by a factor equal to the number of sets. The paper introduces an unsupervised loss to ensure complementarity among keypoint sets, penalizing intersection between them. Furthermore, a descriptor-based weighting scheme is introduced to discourage the detection of keypoints with non-discriminative descriptors.

The network architecture employs a fully convolutional backbone with dilated convolutions to maintain resolution while increasing the effective field-of-view. A novel multi-detector branch is employed to separate the output feature space into distinct detection zones, each corresponding to a different set of keypoints. This architecture is compact, comprising less than half a million parameters.

Performance Evaluation

MD-Net is evaluated against several benchmarks, including the HPatches, Aachen Day-Night, and Image Matching Benchmark. It demonstrates competitive performance in matching accuracy and efficiency, particularly excelling in scenarios where computational efficiency is paramount. For instance, experiments on HPatches confirm that MD-Net achieves mean matching accuracy that is competitive with state-of-the-art methods like R2D2 and ASLFeat. Additionally, on the Image Matching Benchmark, MD-Net offers the best mean average accuracy across multiple tasks, underlining its effectiveness in both stereo and multiview setups.

The benefit of the multi-detector approach is most pronounced in the reduction of computational complexity, which is a critical consideration in practical implementations. Ablation studies further demonstrate that utilizing two detectors strikes an optimal balance between computational efficiency and feature matching performance.

Future Directions

While MD-Net provides a significant step forward in feature extraction for computer vision, the research suggests several avenues for future exploration. These include integrating MD-Net with deep learning-based matching architectures to enhance geometric consistency and further reduce matching computational costs. The paper also indicates potential benefits in refining keypoint selection strategies and extending the approach to more complex synthetic test environments.

MD-Net underscores the ongoing evolution of feature extraction methodologies within computer vision, leveraging deep learning to address long-standing challenges associated with computational efficiency and feature robustness. As the field continues to evolve, modular approaches like MD-Net that balance performance and efficiency will be increasingly integral to advanced computer vision applications.

PDF Markdown