Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ResMatch: Residual Attention Learning for Local Feature Matching (2307.05180v1)

Published 11 Jul 2023 in cs.CV

Abstract: Attention-based graph neural networks have made great progress in feature matching learning. However, insight of how attention mechanism works for feature matching is lacked in the literature. In this paper, we rethink cross- and self-attention from the viewpoint of traditional feature matching and filtering. In order to facilitate the learning of matching and filtering, we inject the similarity of descriptors and relative positions into cross- and self-attention score, respectively. In this way, the attention can focus on learning residual matching and filtering functions with reference to the basic functions of measuring visual and spatial correlation. Moreover, we mine intra- and inter-neighbors according to the similarity of descriptors and relative positions. Then sparse attention for each point can be performed only within its neighborhoods to acquire higher computation efficiency. Feature matching networks equipped with our full and sparse residual attention learning strategies are termed ResMatch and sResMatch respectively. Extensive experiments, including feature matching, pose estimation and visual localization, confirm the superiority of our networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Netvlad: Cnn architecture for weakly supervised place recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 5297–5307, 2016.
  2. Three things everyone should know to improve object retrieval. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 2911–2918, 2012.
  3. Magsac++, a fast, reliable and accurate robust estimator. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 1304–1312, 2020.
  4. Key. net: Keypoint detection by handcrafted and learned cnn filters revisited. IEEE Trans. Pattern Anal. Mach. Intell., 45(1):698–711, 2023.
  5. Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 4181–4190, 2017.
  6. An evaluation of feature matchers for fundamental matrix estimation. In Proc. Brit. Mach. Vis. Conf., 2019.
  7. Learning graph matching. IEEE Trans. Pattern Anal. Mach. Intell., 31(6):1048–1058, 2009.
  8. Htmatch: An efficient hybrid transformer based graph neural network for local feature matching. Signal Process., 204:108859, 2023.
  9. Learning to match features with seeded graph matching network. In Proc. IEEE Int. Conf. Comput. Vis., pages 6301–6310, 2021.
  10. Aspanformer: Detector-free image matching with adaptive span transformer. In Proc. Europ. Conf. Comput. Vis., pages 20–36, 2022.
  11. Matching with prosac-progressive sample consensus. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 220–226, 2005.
  12. Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst., 26, 2013.
  13. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 5828–5839, 2017.
  14. Superpoint: Self-supervised interest point detection and description. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pages 224–236, 2018.
  15. An image is worth 16x16 words: Transformers for image recognition at scale. Proc. Int. Conf. Learn. Represent., 2020.
  16. D2-net: A trainable cnn for joint description and detection of local features. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pages 8092–8101, 2019.
  17. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381–395, 1981.
  18. Fast orb-slam without keypoint descriptors. IEEE Trans. Image Process., 31:1433–1446, 2022.
  19. Graph u-nets. IEEE Trans. Pattern Anal. Mach. Intell., 44(9):4948–4960, 2022.
  20. Topicfm: Robust and interpretable feature matching with topic-assisted. In Proc. AAAI Conf. Artif. Intell., 2023.
  21. Deep residual learning for image recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 770–778, 2016.
  22. Realformer: Transformer likes residual attention. arXiv preprint arXiv:2012.11747, 2020.
  23. Reconstructing the world* in six days*(as captured by the yahoo 100 million image dataset). In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 3287–3295, 2015.
  24. Transformers are rnns: Fast autoregressive transformers with linear attention. In Proc. Int. Conf. Mach. Learn., pages 5156–5165, 2020.
  25. Code: Coherence based decision boundaries for feature correspondence. IEEE Trans. Pattern Anal. Mach. Intell., 40(1):34–47, 2018.
  26. Learnable motion coherence for correspondence pruning. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pages 3237–3246, 2021.
  27. David G Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 60(2):91–110, 2004.
  28. Paraformer: Parallel attention transformer for efficient feature matching. In Proc. AAAI Conf. Artif. Intell., 2023.
  29. Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis., 129(1):23–79, 2021.
  30. Robust point matching via vector field consensus. IEEE Trans. Image Process., 23(4):1706–1721, 2014.
  31. Working hard to know your neighbor’s margins: Local descriptor learning loss. In Adv. Neural Inf. Process. Syst., pages 4829–4840, 2017.
  32. U-net: Convolutional networks for biomedical image segmentation. In Proc. Med. Image Comput. Comput. Assist. Interv., pages 234–241, 2015.
  33. From coarse to fine: Robust hierarchical localization at large scale. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019.
  34. Superglue: Learning feature matching with graph neural networks. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 4938–4947, 2020.
  35. Benchmarking 6dof outdoor visual localization in changing conditions. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 8601–8610, 2018.
  36. Structure-from-motion revisited. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 4104–4113, 2016.
  37. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., volume 1, pages 519–528, 2006.
  38. Matchable image retrieval by learning from surface reconstruction. In Proc. Asian Conf. Comput. Vis., 2018.
  39. Clustergnn: Cluster-based coarse-to-fine graph neural network for efficient feature matching. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 12517–12526, 2022.
  40. Loftr: Detector-free local feature matching with transformers. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 8922–8931, 2021.
  41. Quadtree attention for vision transformers. In Proc. Int. Conf. Learn. Represent., 2022.
  42. Efficient transformers: A survey. ACM Computing Surveys, 55(6):1–28, 2022.
  43. Yfcc100m: The new data in multimedia research. Commun. ACM, 59(2):64–73, 2016.
  44. Feature correspondence via graph matching: Models and global optimization. In Proc. Europ. Conf. Comput. Vis., pages 596–609, 2008.
  45. Disk: Learning local features with policy gradient. In Adv. Neural Inf. Process. Syst., pages 14254–14265, 2020.
  46. Attention is all you need. In Adv. Neural Inf. Process. Syst., volume 30, 2017.
  47. Graph attention networks. In Proc. Int. Conf. Learn. Represent., 2018.
  48. Openglue: Open source graph neural net based pipeline for image matching. arXiv preprint arXiv:2204.08870, 2022.
  49. Convolution-enhanced evolving attention networks. IEEE Trans. Pattern Anal. Mach. Intell., 2023.
  50. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proc. IEEE Int. Conf. Comput. Vis., pages 1625–1632, 2013.
  51. Deepmatcher: A deep transformer-based network for robust and accurate local feature matching. arXiv preprint arXiv:2301.02993, 2023.
  52. Learning soft estimator of keypoint scale and orientation with probabilistic covariant loss. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 19406–19415, 2022.
  53. Learning to find good correspondences. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 2666–2674, 2018.
  54. Learning two-view correspondences and geometry using order-aware network. 2019.
  55. Convmatch: Rethinking network design for two-view correspondence learning. In Proc. AAAI Conf. Artif. Intell., 2023.
  56. Zhengyou Zhang. Determining the epipolar geometry and its uncertainty: A review. Int. J. Comput. Vis., 27:161–195, 1998.
Citations (1)

Summary

We haven't generated a summary for this paper yet.