GMTR: Graph Matching Transformers (2311.08141v2)
Abstract: Vision transformers (ViTs) have recently been used for visual matching beyond object detection and segmentation. However, the original grid dividing strategy of ViTs neglects the spatial information of the keypoints, limiting the sensitivity to local information. Therefore, we propose QueryTrans (Query Transformer), which adopts a cross-attention module and keypoints-based center crop strategy for better spatial information extraction. We further integrate the graph attention module and devise a transformer-based graph matching approach GMTR (Graph Matching TRansformers) whereby the combinatorial nature of GM is addressed by a graph transformer neural GM solver. On standard GM benchmarks, GMTR shows competitive performance against the SOTA frameworks. Specifically, on Pascal VOC, GMTR achieves $\mathbf{83.6\%}$ accuracy, $\mathbf{0.9\%}$ higher than the SOTA framework. On Spair-71k, GMTR shows great potential and outperforms most of the previous works. Meanwhile, on Pascal VOC, QueryTrans improves the accuracy of NGMv2 from $80.1\%$ to $\mathbf{83.3\%}$, and BBGM from $79.0\%$ to $\mathbf{84.5\%}$. On Spair-71k, QueryTrans improves NGMv2 from $80.6\%$ to $\mathbf{82.5\%}$, and BBGM from $82.1\%$ to $\mathbf{83.9\%}$. Source code will be made publicly available.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2021.
- “Swin transformer: Hierarchical vision transformer using shifted windows,” in ICCV, 2021, pp. 10012–10022.
- “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in CVPR, 2021.
- “End-to-end object detection with transformers,” in ECCV, 2020.
- “Deep learning of graph matching,” in CVPR, 2018, pp. 2684–2693.
- “Learning combinatorial embedding networks for deep graph matching,” in ICCV, 2019, pp. 3056–3065.
- “Neural graph matching network: Learning lawler’s quadratic assignment problem with extension to hypergraph and multiple-graph matching,” IEEE TPAMI, 2022.
- “Learning deep graph matching with channel-independent embedding and hungarian attention,” in ICLR, 2020.
- “A generalization of transformer networks to graphs,” arXiv preprint arXiv:2012.09699, 2020.
- “Superglue: Learning feature matching with graph neural networks,” in CVPR, 2020.
- “Loftr: Detector-free local feature matching with transformers,” in CVPR, 2021.
- “Poselets: Body part detectors trained using 3d human pose annotations,” in ICCV, 2009, pp. 1365–1372.
- “Spair-71k: A large-scale benchmark for semantic correspondence,” arXiv preprint arXiv:1908.10543, 2019.
- “Deep graph matching via blackbox differentiation of combinatorial solvers,” in ECCV, 2020, pp. 407–424.
- “Appearance and structure aware robust deep visual graph matching: Attack, defense and beyond,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15263–15272.
- “Graph matching with bi-level noisy correspondence,” arXiv preprint arXiv:2212.04085, 2022.
- “Masked label prediction: Unified message passing model for semi-supervised classification,” in IJCAI, 2021.
- “Semi-supervised classification with graph convolutional networks,” ICLR, 2017.
- “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
- “How attentive are graph attention networks?,” arXiv preprint arXiv:2105.14491, 2021.
- R. Sinkhorn and A. Rangarajan, “A relationship between arbitrary positive matrices and doubly stochastic matrices,” Ann. Math. Statistics, 1964.
- “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2014.
- “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
- “Incorporating convolution designs into visual transformers,” in ICCV, 2021, pp. 579–588.
- “Xcit: Cross-covariance image transformers,” NeurIPS, 2021.