Papers
Topics
Authors
Recent
2000 character limit reached

HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free Matching (2403.12543v1)

Published 19 Mar 2024 in cs.CV

Abstract: Deep learning-based image matching methods play a crucial role in computer vision, yet they often suffer from substantial computational demands. To tackle this challenge, we present HCPM, an efficient and detector-free local feature-matching method that employs hierarchical pruning to optimize the matching pipeline. In contrast to recent detector-free methods that depend on an exhaustive set of coarse-level candidates for matching, HCPM selectively concentrates on a concise subset of informative candidates, resulting in fewer computational candidates and enhanced matching efficiency. The method comprises a self-pruning stage for selecting reliable candidates and an interactive-pruning stage that identifies correlated patches at the coarse level. Our results reveal that HCPM significantly surpasses existing methods in terms of speed while maintaining high accuracy. The source code will be made available upon publication.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In CVPR, pages 5173–5182, 2017a.
  2. Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5173–5182, 2017b.
  3. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
  4. Htmatch: An efficient hybrid transformer based graph neural network for local feature matching. Signal Processing, 204:108859, 2023.
  5. Learning to match features with seeded graph matching network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6301–6310, 2021.
  6. Aspanformer: Detector-free image matching with adaptive span transformer. In ECCV, pages 20–36, 2022a.
  7. Sparsevit: Revisiting activation sparsity for efficient high-resolution vision transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  8. Guide local feature matching by overlap estimation. In AAAI, 2022b.
  9. Rethinking attention with performers. arXiv preprint arXiv:2009.14794, 2020.
  10. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 2022.
  11. Superpoint: Self-supervised interest point detection and description. In CVPR, pages 224–236, 2018.
  12. D2-net: A trainable cnn for joint detection and description of local features. In CVPR, 2019.
  13. Unleashing vanilla vision transformer with masked image modeling for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  14. Adaptive token sampling for efficient vision transformers. In European Conference on Computer Vision, pages 396–414. Springer, 2022.
  15. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  16. Adaptive assignment for geometry aware local feature matching. arXiv preprint arXiv:2207.08427, 2022.
  17. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
  18. Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML, pages 5156–5165, 2020.
  19. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451, 2020.
  20. Composite slice transformer: An efficient transformer with composition of multi-scale multi-range attentions. In The Eleventh International Conference on Learning Representations, 2022.
  21. Megadepth: Learning single-view depth prediction from internet photos. In CVPR, pages 2041–2050, 2018a.
  22. Megadepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2041–2050, 2018b.
  23. Feature pyramid networks for object detection. In CVPR, pages 2117–2125, 2017a.
  24. Focal loss for dense object detection. In ICCV, pages 2980–2988, 2017b.
  25. Lightglue: Local feature matching at light speed. arXiv preprint arXiv:2306.13643, 2023.
  26. David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.
  27. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
  28. R2d2: Repeatable and reliable detector and descriptor. In CVPR, 2019.
  29. Superglue: Learning feature matching with graph neural networks. In CVPR, pages 4938–4947, 2020.
  30. Quad-networks: unsupervised learning to rank for interest point detection. In CVPR, 2017.
  31. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3531–3539, 2021.
  32. Clustergnn: Cluster-based coarse-to-fine graph neural network for efficient feature matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12517–12526, 2022.
  33. Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.
  34. Dynamic token pruning in plain vision transformers for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 777–786, 2023.
  35. Quadtree attention for vision transformers. arXiv preprint arXiv:2201.02767, 2022a.
  36. Patch slimming for efficient vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12165–12174, 2022b.
  37. Attention is all you need. NeurIPS, 30, 2017.
  38. Matchformer: Interleaving attention in transformers for feature matching. In Proceedings of the Asian Conference on Computer Vision, pages 2746–2762, 2022.
  39. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768, 2020.
  40. Joint token pruning and squeezing towards more aggressive compression of vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2092–2101, 2023.
  41. Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297, 2020.
  42. Patch2pix: Epipolar-guided pixel-level correspondences. In CVPR, pages 4669–4678, 2021.
Citations (1)

Summary

  • The paper introduces HCPM, a method that prunes redundant candidates to streamline detector-free feature matching.
  • It employs a two-stage approach with self-pruning based on confidence scores and interactive pruning using cross-attention transformers.
  • Experiments demonstrate that HCPM reduces computational cost by nearly 50% while achieving competitive matching accuracy.

Hierarchical Candidates Pruning for Efficient Detector-Free Matching

Introduction to HCPM

The landscape of local feature matching within computer vision has encountered a significant stride towards advancements with detector-free methods, notably aligning interests towards achieving higher accuracy while addressing computational complexities. The paper introduces HCPM, a novel, efficient detector-free local feature-matching method that incorporates hierarchical pruning to streamline the matching pipeline. This approach distinctly contrasts recent endeavors which rely heavily on an exhaustive set of coarse-level candidates for matching. By selectively zeroing in on an informative subset of candidates, HCPM aims to cut down on the computational overhead profoundly while preserving the quality of matches.

Key Contributions

  • Detector-Free Efficiency: Introducing an efficient pathway in the domain of detector-free local feature matching by implementing a self-pruning stage and an interactive-pruning stage, aimed at minimizing redundancy and disturbances during the matching process.
  • Hierarchical Pruning Process: A two-stage pruning method that includes self-pruning for selecting reliable candidates based on a confidence score, followed by interactive-pruning which refines candidates further by leveraging cross-attention mechanisms within a transformer architecture.
  • Differentiable Selection Strategy: Adoption of a differentiable selection approach using Gumbel-Softmax learned masks, which automates the pruning process, eliminating the need for manual threshold setting, thereby enhancing method efficacy.
  • Reduced Computational Demands: Validation through experiments that HCPM achieves competitive performance in local feature matching tasks with almost half the computational cost compared to existing state-of-the-art methods.

Efficient Matching Pipeline

HCPM embarks on addressing computational inefficiencies by employing a hierarchical pruning technique that is divided into self-pruning and interactive-pruning phases. The self-pruning stage is designed to filter out non-informative candidates using a simple yet effective activation mechanism to select the top-k candidates. This selection is based on a confidence score attributed to each candidate, thereby reducing the number of candidates to a manageable subset. Moving forward, the interactive-pruning phase leverages multiple self-cross attention modules to aggregate and refine features, culminating in an automatic selection process via Differentiable Interactive Candidates Selection (DICS). This integration effectively captures and retains co-visible features essential for maintaining high accuracy in matching performance.

Theoretical and Practical Implications

From a theoretical standpoint, HCPM contributes a nuanced understanding of reducing computational complexity in detector-free local feature matching by highlighting the efficacy of hierarchical pruning. This methodological approach paves the way for future explorations into leveraging pruning techniques to enhance computational efficiency without compromising accuracy.

Practical implications of HCPM are profound, especially in real-world applications requiring real-time performance and power consumption optimization. By demonstrating that competitive matching accuracy can be achieved with substantially lower computational demand, HCPM sets a benchmark for efficiency in local feature matching tasks, spanning across applications such as autonomous driving, 3D reconstruction, and visual localization.

Looking Forward

While HCPM marks a significant advancement in efficient detector-free matching, the exploration into optimizing these models continues. Future developments could explore integrating more intelligent pruning mechanisms that dynamically adjust based on contextual information within the images. Furthermore, extending the HCPM approach to other domains of computer vision where computational efficiency is paramount could yield significant benefits, reinforcing the importance of balancing accuracy with computational overhead.

In essence, HCPM stands out as an innovative approach in slashing computational demands associated with detector-free local feature matching, encouraging a broader exploration into efficient methodologies capable of adapting to the varying complexities presented in real-world scenarios.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 26 likes about this paper.