Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network (2307.01447v1)

Published 4 Jul 2023 in cs.CV

Abstract: Accurately matching local features between a pair of images is a challenging computer vision task. Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images for visual and geometric information reasoning. However, in the context of feature matching, considerable keypoints are non-repeatable due to occlusion and failure of the detector, and thus irrelevant for message passing. The connectivity with non-repeatable keypoints not only introduces redundancy, resulting in limited efficiency, but also interferes with the representation aggregation process, leading to limited accuracy. Targeting towards high accuracy and efficiency, we propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide compact and meaningful message passing. More specifically, our Bilateral Context-Aware Sampling Module first dynamically samples two small sets of well-distributed keypoints with high matchability scores from the image pair. Then, our Matchable Keypoint-Assisted Context Aggregation Module regards sampled informative keypoints as message bottlenecks and thus constrains each keypoint only to retrieve favorable contextual information from intra- and inter- matchable keypoints, evading the interference of irrelevant and redundant connectivity with non-repeatable ones. Furthermore, considering the potential noise in initial keypoints and sampled matchable ones, the MKACA module adopts a matchability-guided attentional aggregation operation for purer data-dependent context propagation. By these means, we achieve the state-of-the-art performance on relative camera estimation, fundamental matrix estimation, and visual localization, while significantly reducing computational and memory complexity compared to typical attentional GNNs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” International Journal of Computer Vision, vol. 129, pp. 23–79, 2021.
  2. R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
  3. J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4104–4113.
  4. P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 716–12 725.
  5. H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, “Image fusion meets deep learning: A survey and perspective,” Information Fusion, vol. 76, pp. 323–336, 2021.
  6. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 4938–4947.
  7. Y. Lu, J. Ma, L. Fang, X. Tian, and J. Jiang, “Robust and scalable gaussian process regression and its applications,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 950–21 959.
  8. K. M. Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, and P. Fua, “Learning to find good correspondences,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2666–2674.
  9. J. Zhang, D. Sun, Z. Luo, A. Yao, L. Zhou, T. Shen, Y. Chen, L. Quan, and H. Liao, “Learning two-view correspondences and geometry using order-aware network,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 5845–5854.
  10. W. Sun, W. Jiang, E. Trulls, A. Tagliasacchi, and K. M. Yi, “Acne: Attentive context normalization for robust permutation-equivariant learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 286–11 295.
  11. C. Zhao, Y. Ge, F. Zhu, R. Zhao, H. Li, and M. Salzmann, “Progressive correspondence pruning by consensus learning,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 6464–6473.
  12. Y. Liu, L. Liu, C. Lin, Z. Dong, and W. Wang, “Learnable motion coherence for correspondence pruning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 3237–3246.
  13. L. Dai, Y. Liu, J. Ma, L. Wei, T. Lai, C. Yang, and R. Chen, “Ms2dg-net: Progressive correspondence learning via multiple sparse semantics dynamic graph,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 8973–8982.
  14. L. Zheng, G. Xiao, Z. Shi, S. Wang, and J. Ma, “Msa-net: Establishing reliable correspondences by multiscale attention network,” IEEE Transactions on Image Processing, vol. 31, pp. 4598–4608, 2022.
  15. S. Zhang and J. Ma, “Convmatch: Rethinking network design for two-view correspondence learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  16. Z. Li, S. Zhang, and J. Ma, “U-match: Two-view correspondence learning with hierarchy-aware local context aggregation,” in Proceedings of the International Joint Conference on Artificial Intelligence, 2023.
  17. Z. Li, Y. Ma, X. Mei, and J. Ma, “Two-view correspondence learning using graph neural network with reciprocal neighbor attention,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 202, pp. 114–124, 2023.
  18. X. Liu, G. Xiao, R. Chen, and J. Ma, “Pgfnet: Preference-guided filtering network for two-view correspondence learning,” IEEE Transactions on Image Processing, vol. 32, pp. 1367–1378, 2023.
  19. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 652–660.
  20. S. Suwanwimolkul and S. Komorita, “Efficient linear attention for fast and accurate keypoint matching,” in Proceedings of the International Conference on Multimedia Retrieval, 2022, pp. 330–341.
  21. Y. Cai, L. Li, D. Wang, X. Li, and X. Liu, “Htmatch: An efficient hybrid transformer based graph neural network for local feature matching,” Signal Processing, vol. 204, p. 108859, 2023.
  22. B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li, “Yfcc100m: The new data in multimedia research,” Communications of the ACM, vol. 59, no. 2, pp. 64–73, 2016.
  23. H. Chen, Z. Luo, J. Zhang, L. Zhou, X. Bai, Z. Hu, C.-L. Tai, and L. Quan, “Learning to match features with seeded graph matching network,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 6301–6310.
  24. Y. Shi, J.-X. Cai, Y. Shavit, T.-J. Mu, W. Feng, and K. Zhang, “Clustergnn: Cluster-based coarse-to-fine graph neural network for efficient feature matching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 517–12 526.
  25. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–110, 2004.
  26. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in Proceedings of the IEEE International Conference on Computer Vision.   Ieee, 2011, pp. 2564–2571.
  27. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 224–236.
  28. M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 8092–8101.
  29. J. Revaud, C. De Souza, M. Humenberger, and P. Weinzaepfel, “R2d2: Reliable and repeatable detector and descriptor,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 32, 2019.
  30. C. Wang, R. Xu, K. Lv, S. Xu, W. Meng, Y. Zhang, B. Fan, and X. Zhang, “Attention weighted local descriptors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  31. X. Lu, Y. Yan, B. Kang, and S. Du, “Paraformer: Parallel attention transformer for efficient feature matching,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  32. Z. Li, Y. Ma, X. Mei, J. Huang, and J. Ma, “Guided neighborhood affine subspace embedding for feature matching,” Pattern Recognition, vol. 124, p. 108489, 2022.
  33. Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, “Hierarchical graph representation learning with differentiable pooling,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 31, 2018.
  34. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 30, 2017.
  35. R. Wang, J. Yan, and X. Yang, “Learning combinatorial embedding networks for deep graph matching,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 3056–3065.
  36. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proceedings of the International Conference on Learning Representations, 2020.
  37. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proceedings of the European Conference on Computer Vision, 2020, pp. 213–229.
  38. H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille, and L.-C. Chen, “Axial-deeplab: Stand-alone axial-attention for panoptic segmentation,” in Proceedings of the European Conference on Computer Vision, 2020, pp. 108–126.
  39. J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8922–8931.
  40. N. Kitaev, Ł. Kaiser, and A. Levskaya, “Reformer: The efficient transformer,” in Proceedings of the International Conference on Learning Representations, 2020.
  41. Y. Tay, D. Bahri, L. Yang, D. Metzler, and D.-C. Juan, “Sparse sinkhorn attention,” in Proceedings of the International Conference on Machine Learning, 2020, pp. 9438–9447.
  42. A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in Proceedings of the International Conference on Machine Learning, 2020, pp. 5156–5165.
  43. S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma, “Linformer: Self-attention with linear complexity,” arXiv preprint arXiv:2006.04768, 2020.
  44. B. Chen, P. Li, B. Li, C. Li, L. Bai, C. Lin, M. Sun, J. Yan, and W. Ouyang, “Psvit: Better vision transformer via token pooling and attention sharing,” arXiv preprint arXiv:2108.03428, 2021.
  45. Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, “Efficient transformers: A survey,” ACM Computing Surveys, vol. 55, no. 6, pp. 1–28, 2022.
  46. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  47. R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J.-M. Frahm, “Usac: A universal framework for random sample consensus,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 2022–2038, 2012.
  48. D. Barath and J. Matas, “Graph-cut ransac,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6733–6741.
  49. D. Barath, J. Noskova, M. Ivashechkin, and J. Matas, “Magsac++, a fast, reliable and accurate robust estimator,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 1304–1312.
  50. M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 26, 2013.
  51. P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
  52. T. Shen, Z. Luo, L. Zhou, R. Zhang, S. Zhu, T. Fang, and L. Quan, “Matchable image retrieval by learning from surface reconstruction,” in Proceedings of the Asian Conference on Computer Vision, 2019, pp. 415–431.
  53. A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt, “Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,” ACM Transactions on Graphics, vol. 36, no. 4, p. 1, 2017.
  54. G. Bae, I. Budvytis, and R. Cipolla, “Multi-view depth estimation by fusing single-view depth probability with multi-view geometry,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 2842–2851.
  55. R. Arandjelović and A. Zisserman, “Three things everyone should know to improve object retrieval,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2911–2918.
  56. L. Cavalli, V. Larsson, M. R. Oswald, T. Sattler, and M. Pollefeys, “Handcrafted outlier detection revisited,” in Proceedings of the European Conference on Computer Vision, 2020, pp. 770–787.
  57. J.-W. Bian, Y.-H. Wu, J. Zhao, Y. Liu, L. Zhang, M.-M. Cheng, and I. Reid, “An evaluation of feature matchers for fundamental matrix estimation,” in Proceedings of the British Machine Vision Conference, 2019.
  58. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 2012, pp. 573–580.
  59. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, 2012, pp. 3354–3361.
  60. A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, “Tanks and temples: Benchmarking large-scale scene reconstruction,” ACM Transactions on Graphics, vol. 36, no. 4, pp. 1–13, 2017.
  61. K. Wilson and N. Snavely, “Robust global translations with 1dsfm,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 61–75.
  62. Z. Zhang, “Determining the epipolar geometry and its uncertainty: A review,” International Journal of Computer Vision, vol. 27, pp. 161–195, 1998.
  63. J. L. Schonberger, H. Hardmeier, T. Sattler, and M. Pollefeys, “Comparative evaluation of hand-crafted and learned local features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1482–1491.
  64. T. Sattler, W. Maddern, C. Toft, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic et al., “Benchmarking 6dof outdoor visual localization in changing conditions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8601–8610.
Citations (1)

Summary

We haven't generated a summary for this paper yet.