DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector (2404.08928v1)
Abstract: In this paper, we analyze and improve into the recently proposed DeDoDe keypoint detector. We focus our analysis on some key issues. First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training. Second, we address issues related to data augmentation. In particular, the DeDoDe detector is sensitive to large rotations. We fix this by including 90-degree rotations as well as horizontal flips. Finally, the decoupled nature of the DeDoDe detector makes evaluation of downstream usefulness problematic. We fix this by matching the keypoints with a pretrained dense matcher (RoMa) and evaluating two-view pose estimates. We find that the original long training is detrimental to performance, and therefore propose a much shorter training schedule. We integrate all these improvements into our proposed detector DeDoDe v2 and evaluate it with the original DeDoDe descriptor on the MegaDepth-1500 and IMC2022 benchmarks. Our proposed detector significantly increases pose estimation results, notably from 75.9 to 78.3 mAA on the IMC2022 challenge. Code and weights are available at https://github.com/Parskatt/DeDoDe
- Three things everyone should know to improve object retrieval. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2911–2918. IEEE, 2012.
- Magsac++, a fast, reliable and accurate robust estimator. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1304–1312, 2020.
- Key.Net: Keypoint detection by handcrafted and learned cnn filters. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5836–5844, 2019.
- Harrisz+: Harris corner selection for next-gen image matching pipelines. Pattern Recognition Letters, 158:141–147, 2022.
- Improving harris corner selection strategy. IET Computer Vision, 5(2):87–96, 2011.
- A case for using rotation invariant features in state of the art feature matchers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5110–5119, 2022.
- Steerers: A framework for rotation equivariant keypoint descriptors. In IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2024.
- ASpanFormer: Detector-free image matching with adaptive span transformer. In Proc. European Conference on Computer Vision (ECCV), 2022.
- Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018.
- D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- DKM: Dense kernelized feature matching for geometry estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 2023.
- DeDoDe: Detect, Don’t Describe – Describe, Don’t Detect for Local Feature Matching. In 2024 International Conference on 3D Vision (3DV). IEEE, 2024a.
- RoMa: Robust dense feature matching. In IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2024b.
- SiLK: Simple Learned Keypoints. In ICCV, 2023.
- A combined corner and edge detector. In Alvey vision conference, pages 10–5244. Citeseer, 1988.
- Image matching challenge 2022, 2022.
- Megadepth: Learning single-view depth prediction from internet photos. In IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 2041–2050, 2018.
- LightGlue: Local Feature Matching at Light Speed. In IEEE Int’l Conf. Computer Vision (ICCV), 2023.
- David G Lowe. Distinctive image features from scale-invariant keypoints. Int’l J. Computer Vision (IJCV), 60:91–110, 2004.
- Repeatability is not enough: Learning affine regions via discriminability. In European Conf. Computer Vision (ECCV), page 287–304, 2018.
- R2d2: Reliable and repeatable detector and descriptor. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
- Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
- Shi and Tomasi. Good features to track. In 1994 Proceedings of IEEE conference on computer vision and pattern recognition, pages 593–600. IEEE, 1994.
- Clustergnn: Cluster-based coarse-to-fine graph neural network for efficient feature matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12517–12526, 2022.
- LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8922–8931, 2021.
- Quadtree attention for vision transformers. In International Conference on Learning Representations, 2022.
- GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6258–6268, 2020.
- Warp Consistency for Unsupervised Learning of Dense Correspondences. In IEEE/CVF International Conference on Computer Vision, ICCV, 2021.
- PDC-Net+: Enhanced Probabilistic Dense Correspondence Network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Disk: Learning local features with policy gradient. Advances in Neural Information Processing Systems (NeurIPS), 33:14254–14265, 2020.
- Tilde: A temporally invariant learned detector. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5279–5288, 2015.
- MatchFormer: Interleaving attention in transformers for feature matching. In Asian Conference on Computer Vision, 2022.
- Alike: Accurate and lightweight keypoint detection and descriptor extraction. IEEE Transactions on Multimedia, 2022.
- Aliked: A lighter keypoint and descriptor extraction network via deformable transformation. IEEE Transactions on Instrumentation & Measurement, 72:1–16, 2023.