Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 89 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer (2310.13605v1)

Published 20 Oct 2023 in cs.CV

Abstract: Local Feature Matching, an essential component of several computer vision tasks (e.g., structure from motion and visual localization), has been effectively settled by Transformer-based methods. However, these methods only integrate long-range context information among keypoints with a fixed receptive field, which constrains the network from reconciling the importance of features with different receptive fields to realize complete image perception, hence limiting the matching accuracy. In addition, these methods utilize a conventional handcrafted encoding approach to integrate the positional information of keypoints into the visual descriptors, which limits the capability of the network to extract reliable positional encoding message. In this study, we propose Feature Matching with Reconciliatory Transformer (FMRT), a novel Transformer-based detector-free method that reconciles different features with multiple receptive fields adaptively and utilizes parallel networks to realize reliable positional encoding. Specifically, FMRT proposes a dedicated Reconciliatory Transformer (RecFormer) that consists of a Global Perception Attention Layer (GPAL) to extract visual descriptors with different receptive fields and integrate global context information under various scales, Perception Weight Layer (PWL) to measure the importance of various receptive fields adaptively, and Local Perception Feed-forward Network (LPFFN) to extract deep aggregated multi-scale local feature representation. Extensive experiments demonstrate that FMRT yields extraordinary performance on multiple benchmarks, including pose estimation, visual localization, homography estimation, and image matching.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113.
  2. V. Mousavi, M. Varshosaz, F. Remondino, S. Pirasteh, and J. Li, “A two-step descriptor-based keypoint filtering algorithm for robust image matching,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–21, 2022.
  3. N. Li, Z. Lv, and Z. Guo, “Pulse rfi mitigation in synthetic aperture radar data via a three-step approach: Location, notch, and recovery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–17, 2022.
  4. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
  5. J. Li, W. Wu, B. Yang, X. Zou, Y. Yang, X. Zhao, and Z. Dong, “Whu-helmet: A helmet-based multi-sensor slam dataset for the evaluation of real-time 3d mapping in large-scale gnss-denied environments,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
  6. X. Wan, Y. Shao, S. Zhang, and S. Li, “Terrain aided planetary uav localization based on geo-referencing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–18, 2022.
  7. G. P. C. Júnior, A. M. Rezende, V. R. Miranda, R. Fernandes, H. Azpúrua, A. A. Neto, G. Pessin, and G. M. Freitas, “Ekf-loam: an adaptive fusion of lidar slam with wheel odometry and inertial data for confined spaces with few geometric features,” IEEE Transactions on Automation Science and Engineering, vol. 19, no. 3, pp. 1458–1471, 2022.
  8. W. Zhao, R. Lin, S. Dong, and Y. Cheng, “A study of the global topological map construction algorithm based on grid map representation for multirobot,” IEEE Transactions on Automation Science and Engineering, 2022.
  9. È. Pairet Artau, J. D. Hernández Vega, M. Carreras Pérez, Y. R. Petillot, and M. Lahijanian, “Online mapping and motion planning under uncertainty for safe navigation in unknown environments,” IEEE Transactions on Automation Science and Engineering, 2022, vol. 19, núm. 4, p. 3356-3378, 2022.
  10. B. Sun, G. Liu, and Y. Yuan, “F3-net: Multi-view scene matching for drone-based geo-localization,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
  11. H. Taira, M. Okutomi, T. Sattler, M. Cimpoi, M. Pollefeys, J. Sivic, T. Pajdla, and A. Torii, “Inloc: Indoor visual localization with dense matching and view synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7199–7209.
  12. P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 716–12 725.
  13. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
  14. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference on computer vision.   Ieee, 2011, pp. 2564–2571.
  15. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2018.
  16. M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  17. J. Revaud, P. Weinzaepfel, C. De Souza, N. Pion, G. Csurka, Y. Cabon, and M. Humenberger, “R2d2: repeatable and reliable detector and descriptor,” in NeurIPS, 2019.
  18. M. J. Tyszkiewicz, P. Fua, and E. Trulls, “Disk: Learning local features with policy gradient,” in NeurIPS, 2020.
  19. P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4938–4947.
  20. Y. Xia and J. Ma, “Locality-guided global-preserving optimization for robust feature matching,” IEEE Transactions on Image Processing, vol. 31, pp. 5093–5108, 2022.
  21. H. Chen, Z. Luo, J. Zhang, L. Zhou, X. Bai, Z. Hu, C.-L. Tai, and L. Quan, “Learning to match features with seeded graph matching network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6301–6310.
  22. Z. Kuang, J. Li, M. He, T. Wang, and Y. Zhao, “Densegap: Graph-structured dense correspondence learning with anchor points,” arXiv preprint arXiv:2112.06910, 2021.
  23. Y. Shi, J.-X. Cai, Y. Shavit, T.-J. Mu, W. Feng, and K. Zhang, “Clustergnn: Cluster-based coarse-to-fine graph neural network for efficient feature matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 517–12 526.
  24. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236.
  25. Z. Luo, L. Zhou, X. Bai, H. Chen, J. Zhang, Y. Yao, S. Li, T. Fang, and L. Quan, “Aslfeat: Learning local features of accurate shape and localization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 6589–6598.
  26. X. Zhao, X. Wu, J. Miao, W. Chen, P. C. Chen, and Z. Li, “Alike: Accurate and lightweight keypoint detection and descriptor extraction,” IEEE Transactions on Multimedia, 2022.
  27. J. Bian, W.-Y. Lin, Y. Matsushita, S.-K. Yeung, T.-D. Nguyen, and M.-M. Cheng, “Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4181–4190.
  28. J. Zhang, D. Sun, Z. Luo, A. Yao, L. Zhou, T. Shen, Y. Chen, L. Quan, and H. Liao, “Learning two-view correspondences and geometry using order-aware network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5845–5854.
  29. Y. Tao, D. Papadias, and Q. Shen, “Continuous nearest neighbor search,” in VLDB’02: Proceedings of the 28th International Conference on Very Large Databases.   Elsevier, 2002, pp. 287–298.
  30. J. Chen, S. Chen, X. Chen, Y. Dai, and Y. Yang, “Csr-net: Learning adaptive context structure representation for robust feature correspondence,” IEEE Transactions on Image Processing, vol. 31, pp. 3197–3210, 2022.
  31. I. Rocco, M. Cimpoi, R. Arandjelović, A. Torii, T. Pajdla, and J. Sivic, “Neighbourhood consensus networks,” Advances in neural information processing systems, vol. 31, 2018.
  32. I. Rocco, R. Arandjelović, and J. Sivic, “Efficient neighbourhood consensus networks via submanifold sparse convolutions,” in European Conference on Computer Vision.   Springer, 2020, pp. 605–621.
  33. X. Li, K. Han, S. Li, and V. Prisacariu, “Dual-resolution correspondence networks,” Advances in Neural Information Processing Systems, vol. 33, pp. 17 346–17 357, 2020.
  34. Q. Zhou, T. Sattler, and L. Leal-Taixe, “Patch2pix: Epipolar-guided pixel-level correspondences,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4669–4678.
  35. U. Efe, K. G. Ince, and A. Alatan, “Dfm: A performance baseline for deep feature matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4284–4293.
  36. J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8922–8931.
  37. Q. Wang, J. Zhang, K. Yang, K. Peng, and R. Stiefelhagen, “Matchformer: Interleaving attention in transformers for feature matching,” arXiv preprint arXiv:2203.09645, 2022.
  38. K. T. Giang, S. Song, and S. Jo, “Topicfm: Robust and interpretable topic-assisted feature matching,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2447–2455.
  39. H. Chen, Z. Luo, L. Zhou, Y. Tian, M. Zhen, T. Fang, D. McKinnon, Y. Tsin, and L. Quan, “Aspanformer: Detector-free image matching with adaptive span transformer,” in European Conference on Computer Vision.   Springer, 2022, pp. 20–36.
  40. Z. Shen, J. Sun, Y. Wang, X. He, H. Bao, and X. Zhou, “Semi-dense feature matching with transformers and its applications in multiple-view geometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  41. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
  42. A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in International Conference on Machine Learning.   PMLR, 2020, pp. 5156–5165.
  43. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  44. R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE transactions on robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
  45. Y. Wang, Y. Qiu, P. Cheng, and J. Zhang, “Hybrid cnn-transformer features for visual place recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1109–1122, 2022.
  46. L. Dai, H. Liu, H. Tang, Z. Wu, and P. Song, “Ao2-detr: Arbitrary-oriented object detection transformer,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  47. Z. Li, Y. Liu, B. Li, B. Feng, K. Wu, C. Peng, and W. Hu, “Sdtp: Semantic-aware decoupled transformer pyramid for dense image prediction,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 6160–6173, 2022.
  48. Z. Chen, Y. Wang, T. Guan, L. Xu, and W. Liu, “Transformer-based 3d face reconstruction with end-to-end shape-preserved domain transfer,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 12, pp. 8383–8393, 2022.
  49. Z. Liu, Y. Tan, Q. He, and Y. Xiao, “Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4486–4497, 2021.
  50. X. Zhu, Y. Zhou, D. Wang, W. Ouyang, and R. Su, “Mlst-former: Multi-level spatial-temporal transformer for group activity recognition,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  51. H. Yao, W. Luo, W. Yu, X. Zhang, Z. Qiang, D. Luo, and H. Shi, “Dual-attention transformer and discriminative flow for industrial visual anomaly detection,” IEEE Transactions on Automation Science and Engineering, 2023.
  52. H. Wu, Z. Zhao, and Z. Wang, “Meta-unet: Multi-scale efficient transformer attention unet for fast and high-accuracy polyp segmentation,” IEEE Transactions on Automation Science and Engineering, 2023.
  53. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
  54. S. Suwanwimolkul and S. Komorita, “Efficient linear attention for fast and accurate keypoint matching,” in Proceedings of the 2022 International Conference on Multimedia Retrieval, 2022, pp. 330–341.
  55. L. Liu, L. Pan, W. Luo, Q. Xu, Y. Wen, and J. Li, “Fgcnet: Fast graph convolution for matching features,” in 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).   IEEE, 2022, pp. 453–458.
  56. T. Xie, K. Dai, K. Wang, R. Li, and L. Zhao, “Deepmatcher: A deep transformer-based network for robust and accurate local feature matching,” arXiv preprint arXiv:2301.02993, 2023.
  57. K. Dai, T. Xie, K. Wang, Z. Jiang, R. Li, and L. Zhao, “Oamatcher: An overlapping areas-based network for accurate local feature matching,” arXiv preprint arXiv:2302.05846, 2023.
  58. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  59. S. Tang, J. Zhang, S. Zhu, and P. Tan, “Quadtree attention for vision transformers,” arXiv preprint arXiv:2201.02767, 2022.
  60. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  61. C. Wu, F. Wu, T. Qi, Y. Huang, and X. Xie, “Fastformer: Additive attention can be all you need,” arXiv preprint arXiv:2108.09084, 2021.
  62. C. Scribano, G. Franchini, M. Prato, and M. Bertogna, “Dct-former: Efficient self-attention with discrete cosine transform,” Journal of Scientific Computing, vol. 94, no. 3, p. 67, 2023.
  63. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  64. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
  65. Z. Li and N. Snavely, “Megadepth: Learning single-view depth prediction from internet photos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2041–2050.
  66. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  67. P. Truong, M. Danelljan, R. Timofte, and L. Van Gool, “Pdc-net+: Enhanced probabilistic dense correspondence network,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  68. W. Zhao, H. Lu, X. Ye, Z. Cao, and X. Li, “Learning probabilistic coordinate fields for robust correspondences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  69. S. Li, Q. Zhao, and Z. Xia, “Sparse-to-local-dense matching for geometry-guided correspondence estimation,” IEEE Transactions on Image Processing, 2023.
  70. R. Mao, C. Bai, Y. An, F. Zhu, and C. Lu, “3dg-stfm: 3d geometric guided student-teacher feature matching,” in European Conference on Computer Vision.   Springer, 2022, pp. 125–142.
  71. V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk, “Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5173–5182.
  72. T. Xie, K. Dai, K. Wang, R. Li, J. Wang, X. Tang, and L. Zhao, “A deep feature aggregation network for accurate indoor camera localization,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3687–3694, 2022.
  73. K. Dai, T. Xie, K. Wang, Z. Jiang, D. Liu, R. Li, and J. Wang, “Eaainet: An element-wise attention network with global affinity information for accurate indoor visual localization,” IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3166–3173, 2023.
  74. C. Toft, W. Maddern, A. Torii, L. Hammarstrand, E. Stenborg, D. Safari, M. Okutomi, M. Pollefeys, J. Sivic, T. Pajdla et al., “Long-term visual localization revisited,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2074–2088, 2020.

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.