DISK: Learning local features with policy gradient

Published 24 Jun 2020 in cs.CV and cs.LG | (2006.13566v2)

Abstract: Local feature frameworks are difficult to learn in an end-to-end fashion, due to the discreteness inherent to the selection and matching of sparse keypoints. We introduce DISK (DIScrete Keypoints), a novel method that overcomes these obstacles by leveraging principles from Reinforcement Learning (RL), optimizing end-to-end for a high number of correct feature matches. Our simple yet expressive probabilistic model lets us keep the training and inference regimes close, while maintaining good enough convergence properties to reliably train from scratch. Our features can be extracted very densely while remaining discriminative, challenging commonly held assumptions about what constitutes a good keypoint, as showcased in Fig. 1, and deliver state-of-the-art results on three public benchmarks.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (281)

View on Semantic Scholar

Summary

The paper introduces DISK, an RL-based framework that optimizes local feature detection and matching using a policy gradient approach.
It employs a CNN to generate keypoint heatmaps and dense descriptors while leveraging geometric rewards to improve matching accuracy.
DISK achieves state-of-the-art results on benchmarks, notably improving mAA scores compared to traditional methods like SIFT and modern approaches.

Insights into DISK: Learning Local Features with Policy Gradient

The paper "DISK: Learning Local Features with Policy Gradient" presents a significant contribution to the domain of computer vision, specifically addressing the optimization of local feature frameworks. Local features have long been pivotal in various computer vision applications such as Structure-from-Motion (SfM) and SLAM. Despite the advancements in deep learning techniques, the integration of end-to-end learnable solutions for local feature extraction and matching has remained challenging due to the computational complexities involved with sparse keypoint selection and matching.

Methodology Overview

The authors introduce DISK (DIScrete Keypoints), a novel reinforcement learning-based framework. DISK optimizes local feature learning by addressing the discretization challenge inherent in keypoint detection and matching processes. The key innovation of DISK lies in employing a policy gradient approach, which allows for training the system end-to-end, optimizing for a higher number of correct feature matches while maintaining computational feasibility.

The proposed method utilizes a probabilistic model that aligns closely with both training and inference scenarios, enabling robust training from scratch. The model's backbone comprises a CNN that outputs keypoint heatmaps alongside dense descriptors, from which discrete keypoints are sampled. A significant aspect of the methodology is the use of geometric ground truth to assign rewards, enabling the training process to maximize the expected reward through policy gradient methods.

Experimental Validation

DISK's performance is validated through comprehensive experiments across different benchmarks. The model achieves state-of-the-art results on public datasets, notably the 2020 Image Matching Challenge. The authors demonstrate that DISK outperforms traditional methods such as SIFT and its derivatives, as well as modern approaches like SuperPoint and R2D2, both in terms of the number of matches and pose accuracy.

A noteworthy result from the evaluation on the Image Matching Challenge is the model's capability to extract a significantly higher number of correct matches. For instance, when limited to 2048 features per image, DISK secured the top position with a Mean Average Accuracy (mAA) of 0.5132 for stereo tasks and 0.7271 for multiview tasks—a substantial improvement over existing methods.

Theoretical and Practical Implications

The theoretical implications of this research highlight the effectiveness of RL paradigms in overcoming challenges associated with differentiability in local feature learning. By formulating the feature selection and matching processes within a probabilistic framework and using policy gradients, the authors addressed a vital gap in the automated learning of local features.

Practically, DISK's implementation could influence applications relying on precise feature extraction, particularly in real-time settings requiring robust and rapid computations. This could extend to enhanced augmented reality systems, improved photogrammetry techniques, and more streamlined autonomous navigation solutions.

Future Directions

The authors remark on the prospects of enhancing the matching component of DISK with learned neural models, which could further refine match quality and robustness, potentially leading to even superior results in terms of matching precision and computational efficiency.

In conclusion, the research presents a compelling case for the adoption of reinforcement learning strategies in the design of local feature frameworks, paving the way for advancements in computer vision tasks that hinge on accurate keypoint detection and descriptor matching.

Markdown Report Issue