Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector (2404.08928v1)

Published 13 Apr 2024 in cs.CV

Abstract: In this paper, we analyze and improve into the recently proposed DeDoDe keypoint detector. We focus our analysis on some key issues. First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training. Second, we address issues related to data augmentation. In particular, the DeDoDe detector is sensitive to large rotations. We fix this by including 90-degree rotations as well as horizontal flips. Finally, the decoupled nature of the DeDoDe detector makes evaluation of downstream usefulness problematic. We fix this by matching the keypoints with a pretrained dense matcher (RoMa) and evaluating two-view pose estimates. We find that the original long training is detrimental to performance, and therefore propose a much shorter training schedule. We integrate all these improvements into our proposed detector DeDoDe v2 and evaluate it with the original DeDoDe descriptor on the MegaDepth-1500 and IMC2022 benchmarks. Our proposed detector significantly increases pose estimation results, notably from 75.9 to 78.3 mAA on the IMC2022 challenge. Code and weights are available at https://github.com/Parskatt/DeDoDe

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Three things everyone should know to improve object retrieval. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2911–2918. IEEE, 2012.
  2. Magsac++, a fast, reliable and accurate robust estimator. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1304–1312, 2020.
  3. Key.Net: Keypoint detection by handcrafted and learned cnn filters. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5836–5844, 2019.
  4. Harrisz+: Harris corner selection for next-gen image matching pipelines. Pattern Recognition Letters, 158:141–147, 2022.
  5. Improving harris corner selection strategy. IET Computer Vision, 5(2):87–96, 2011.
  6. A case for using rotation invariant features in state of the art feature matchers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5110–5119, 2022.
  7. Steerers: A framework for rotation equivariant keypoint descriptors. In IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2024.
  8. ASpanFormer: Detector-free image matching with adaptive span transformer. In Proc. European Conference on Computer Vision (ECCV), 2022.
  9. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018.
  10. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  11. DKM: Dense kernelized feature matching for geometry estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  12. DeDoDe: Detect, Don’t Describe – Describe, Don’t Detect for Local Feature Matching. In 2024 International Conference on 3D Vision (3DV). IEEE, 2024a.
  13. RoMa: Robust dense feature matching. In IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2024b.
  14. SiLK: Simple Learned Keypoints. In ICCV, 2023.
  15. A combined corner and edge detector. In Alvey vision conference, pages 10–5244. Citeseer, 1988.
  16. Image matching challenge 2022, 2022.
  17. Megadepth: Learning single-view depth prediction from internet photos. In IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 2041–2050, 2018.
  18. LightGlue: Local Feature Matching at Light Speed. In IEEE Int’l Conf. Computer Vision (ICCV), 2023.
  19. David G Lowe. Distinctive image features from scale-invariant keypoints. Int’l J. Computer Vision (IJCV), 60:91–110, 2004.
  20. Repeatability is not enough: Learning affine regions via discriminability. In European Conf. Computer Vision (ECCV), page 287–304, 2018.
  21. R2d2: Reliable and repeatable detector and descriptor. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
  22. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
  23. Shi and Tomasi. Good features to track. In 1994 Proceedings of IEEE conference on computer vision and pattern recognition, pages 593–600. IEEE, 1994.
  24. Clustergnn: Cluster-based coarse-to-fine graph neural network for efficient feature matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12517–12526, 2022.
  25. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8922–8931, 2021.
  26. Quadtree attention for vision transformers. In International Conference on Learning Representations, 2022.
  27. GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6258–6268, 2020.
  28. Warp Consistency for Unsupervised Learning of Dense Correspondences. In IEEE/CVF International Conference on Computer Vision, ICCV, 2021.
  29. PDC-Net+: Enhanced Probabilistic Dense Correspondence Network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  30. Disk: Learning local features with policy gradient. Advances in Neural Information Processing Systems (NeurIPS), 33:14254–14265, 2020.
  31. Tilde: A temporally invariant learned detector. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5279–5288, 2015.
  32. MatchFormer: Interleaving attention in transformers for feature matching. In Asian Conference on Computer Vision, 2022.
  33. Alike: Accurate and lightweight keypoint detection and descriptor extraction. IEEE Transactions on Multimedia, 2022.
  34. Aliked: A lighter keypoint and descriptor extraction network via deformable transformation. IEEE Transactions on Instrumentation & Measurement, 72:1–16, 2023.
Citations (2)

Summary

  • The paper introduces DeDoDe v2, which incorporates training-time non-maximum suppression to mitigate keypoint clustering.
  • It leverages optimized data augmentation with 90° rotations and horizontal flips to boost the detector’s rotational robustness.
  • The evaluation using a pretrained dense matcher for two-view pose estimation shows that shorter training times can yield superior performance.

Enhancements and Evaluation of the DeDoDe Keypoint Detector

Introduction to Keypoint Detection in Computer Vision

Keypoint detection is a critical process in computer vision, facilitating the identification of distinctive local features in images that are robust to variations in viewpoint and illumination. Building upon the foundation laid by the DeDoDe detector, this paper introduces an improved version dubbed DeDoDe v2. Despite the established utility of DeDoDe in extracting keypoints, it exhibited limitations such as keypoint clustering and sensitivity to large rotations. This paper details the analytical and experimental endeavors undertaken to address these issues, culminating in the development of DeDoDe v2. The enhancements include the incorporation of non-maximum suppression during training, refined data augmentation techniques, and a novel evaluation methodology leveraging a pretrained dense matcher for two-view pose estimation. Notably, the paper presents a streamlined training regime that significantly reduces computational requirements without sacrificing, and indeed improving, the detector's performance.

Key Contributions

  • Refinements in Training Methodology: The introduction of non-maximum suppression into the training process addresses the clustering issue observed with the original DeDoDe detector, fostering the generation of more evenly distributed and diverse keypoints across images.
  • Optimized Data Augmentation: By integrating 90-degree rotations and horizontal flips into the data augmentation strategy, the sensitivity of the DeDoDe detector to large in-plane rotations is mitigated, enhancing the rotational robustness of the keypoints detected.
  • Evaluation Strategy: Evaluation of the keypoint detector's effectiveness for downstream tasks, particularly two-view pose estimation, is refined through the use of a pretrained dense matcher (RoMa). This approach allows for a more comprehensive assessment than the repeatability metric alone.
  • Reduced Training Time: The paper outlines a significant reduction in training duration, from the protracted schedule of the original to approximately 20 minutes on a single A100 GPU, without compromising the quality of the generated keypoints.

Analysis and Improvements

The paper methodically addresses the deficiencies of the original DeDoDe detector through a combination of theoretical insights and empirical testing. For instance, the introduction of training-time non-maximum suppression effectively disperses the clustering of keypoints, a problem that previously led to diminished performance in downstream applications. Additionally, the adoption of targeted data augmentations ensures the detector's robustness to rotational variations in input imagery.

One of the standout revelations from this research is the counterproductive nature of prolonged training for keypoint detectors. By correlating training duration with performance metrics on two-view pose estimation, the authors adeptly demonstrate that shorter training periods can yield superior outcomes. This insight not only enhances the efficacy of the DeDoDe v2 detector but also significantly lowers the computational cost associated with its training.

Future Directions and Limitations

While DeDoDe v2 marks a considerable advancement in keypoint detection, the authors acknowledge the ongoing challenges in fully reconciling the objectives of high repeatability with the demands of downstream tasks like pose estimation. The exploration of detector objectives that harmonize with these tasks presents a fertile avenue for future research. Moreover, the paper's findings underscore the necessity of both theoretical and empirical approaches to improve the understanding and development of keypoint detectors in computer vision.

Conclusion

In summation, the development of DeDoDe v2 represents a significant step forward in the field of keypoint detection. By addressing the limitations observed in the original DeDoDe detector, the authors have not only enhanced its performance but also shed light on previously underexplored aspects of keypoint detector training and evaluation. The practical and theoretical insights offered by this paper are poised to inform and inspire future research in the field.