A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images (2404.19311v1)
Abstract: Matching visible and near-infrared (NIR) images remains a significant challenge in remote sensing image fusion. The nonlinear radiometric differences between heterogeneous remote sensing images make the image matching task even more difficult. Deep learning has gained substantial attention in computer vision tasks in recent years. However, many methods rely on supervised learning and necessitate large amounts of annotated data. Nevertheless, annotated data is frequently limited in the field of remote sensing image matching. To address this challenge, this paper proposes a novel keypoint descriptor approach that obtains robust feature descriptors via a self-supervised matching network. A light-weight transformer network, termed as LTFormer, is designed to generate deep-level feature descriptors. Furthermore, we implement an innovative triplet loss function, LT Loss, to enhance the matching performance further. Our approach outperforms conventional hand-crafted local feature descriptors and proves equally competitive compared to state-of-the-art deep learning-based methods, even amidst the shortage of annotated data.
- M. Hu, C. Wu, B. Du, and L. Zhang, “Binary change guided hyperspectral multiclass change detection,” TIP, vol. 32, pp. 791–806, 2023.
- Y. Yao, T. Chen, H. Bi, X. Cai, G. Pei, G. Yang, Z. Yan, X. Sun, X. Xu, and H. Zhang, “Automated object recognition in high-resolution optical remote sensing imagery,” National Science Review, vol. 10, no. 6, p. nwad122, 2023.
- J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation,” in ICCV, 2023, pp. 8081–8090.
- J. Cao, J. Pang, X. Weng, R. Khirodkar, and K. Kitani, “Observation-centric sort: Rethinking sort for robust multi-object tracking,” in CVPR, 2023, pp. 9686–9696.
- Z. Sun, F. Shen, D. Huang, Q. Wang, X. Shu, Y. Yao, and J. Tang, “Pnp: Robust learning from noisy labels by probabilistic noise prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 5311–5320.
- M. Sheng, Z. Sun, Z. Cai, T. Chen, Y. Zhou, and Y. Yao, “Adaptive integration of partial label learning and negative learning for enhanced noisy label learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 4820–4828.
- J. Mao, Y. Yao, Z. Sun, X. Huang, F. Shen, and H.-T. Shen, “Attention map guided transformer pruning for occluded person re-identification on edge device,” IEEE Transactions on Multimedia., vol. 25, pp. 1592–1599, 2023.
- X. Jiang, S. Liu, X. Dai, G. Hu, X. Huang, Y. Yao, G.-S. Xie, and L. Shao, “Deep metric learning based on meta-mining strategy with semiglobal information,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5103–5116, 2024.
- S. Liu, S. Yin, L. Qu, M. Wang, and Z. Song, “A structure-aware framework of unsupervised cross-modality domain adaptation via frequency and spatial knowledge distillation,” IEEE Transactions on Medical Imaging, vol. 42, no. 12, pp. 3919–3931, 2023.
- X. Zhang and Y. Demiris, “Visible and infrared image fusion using deep learning,” TPAMI, vol. 45, no. 8, pp. 10 535–10 554, 2023.
- J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” IJCV, vol. 129, pp. 23–79, 2020.
- S. Wang, H. You, and K. Fu, “Bfsift: A novel method to find feature matches for sar image registration,” IEEE Geoscience and Remote Sensing Letters, vol. 9, no. 4, pp. 649–653, 2012.
- J. Liang, X. Liu, K. Huang, X. Li, D. Wang, and X. Wang, “Automatic registration of multisensor images using an integrated spatial and mutual information (smi) metric,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 1, pp. 603–615, 2014.
- F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens, “Multimodality image registration by maximization of mutual information,” IEEE Transactions on Medical Imaging, vol. 16, no. 2, pp. 187–198, 1997.
- N. Navab, J. Hornegger, W. M. Wells, and A. Frangi, “International conference on medical image computing,” 2015.
- X. Zhang, Y. Zhou, P. Qiao, X. Lv, J. Li, T. Du, and Y. Cai, “Image registration algorithm for remote sensing images based on pixel location information,” Remote Sensing, vol. 15, no. 2, 2023.
- W. Lee, D. Sim, and S.-J. Oh, “A cnn-based high-accuracy registration for remote sensing images,” Remote Sensing, vol. 13, no. 8, 2021.
- T.-W. Hui, X. Tang, and C. C. Loy, “Liteflownet: A lightweight convolutional neural network for optical flow estimation,” in CVPR, 2018, pp. 8981–8989.
- E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in CVPR, 2017, pp. 1647–1655.
- T.-W. Hui, X. Tang, and C. C. Loy, “A lightweight optical flow cnn —revisiting data fidelity and regularization,” TPAMI, vol. 43, no. 8, pp. 2555–2569, 2021.
- T.-W. Hui and C. C. Loy, “Liteflownet3: Resolving correspondence ambiguity for more accurate optical flow estimation,” in ECCV, 2020, pp. 169–184.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, vol. 60, p. 91–110, 2004.
- A. Sedaghat, M. Mokhtarzade, and H. Ebadi, “Uniform robust scale-invariant feature matching for optical remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 11, pp. 4516–4527, 2011.
- S. Paul and U. C. Pati, “Remote sensing optical image registration using modified uniform robust sift,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 9, pp. 1300–1304, 2016.
- H. Yang, X. Li, L. Zhao, and S. Chen, “A novel coarse-to-fine scheme for remote sensing image registration based on sift and phase correlation,” Remote Sensing, vol. 11, no. 15, 2019.
- “Speeded-up robust features (surf),” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346–359, 2008.
- P. F. Alcantarilla, A. Bartoli, and A. J. Davison, “Kaze features,” in ECCV, 2012, pp. 214–227.
- M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independent elementary features,” in ECCV, 2010, pp. 778–792.
- S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in ICCV, 2011, pp. 2548–2555.
- E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in ICCV, 2011, pp. 2564–2571.
- P. F. Alcantarilla, J. Nuevo, and A. Bartoli, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” in British Machine Vision Conference, 2013, pp. 1–11.
- C. Min, Y. Gu, Y. Li, and F. Yang, “Non-rigid infrared and visible image registration by enhanced affine transformation,” Pattern Recognition, vol. 106, p. 107377, 2020.
- Y. Yao, J. Zhang, F. Shen, X. Hua, J. Xu, and Z. Tang, “Exploiting web images for dataset construction: A domain robust approach,” IEEE Transactions on Multimedia., vol. 19, no. 8, pp. 1771–1784, 2017.
- Y. Tang, T. Chen, X. Jiang, Y. Yao, G.-S. Xie, and H.-T. Shen, “Holistic prototype attention network for few-shot video object segmentation,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- Y. Yao, T. Chen, G.-S. Xie, C. Zhang, F. Shen, Q. Wu, Z. Tang, and J. Zhang, “Non-salient region object mining for weakly supervised semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 2623–2632.
- G. Pei, F. Shen, Y. Yao, G.-S. Xie, Z. Tang, and J. Tang, “Hierarchical feature alignment network for unsupervised video object segmentation,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 596–613.
- G. Pei, F. Shen, Y. Yao, T. Chen, X.-S. Hua, and H.-T. Shen, “Hierarchical graph pattern understanding for zero-shot video object segmentation,” IEEE Transactions on Image Processing., vol. 32, pp. 5909–5920, 2023.
- T. Chen, Y. Yao, and J. Tang, “Multi-granularity denoising and bidirectional alignment for weakly supervised semantic segmentation,” IEEE Transactions on Image Processing., vol. 32, pp. 2960–2971, 2023.
- Y. Yao, J. Zhang, F. Shen, L. Liu, F. Zhu, D. Zhang, and H. T. Shen, “Towards automatic construction of diverse, high-quality image datasets,” vol. 32, no. 6, pp. 1199–1211, 2020.
- G. Pei, Y. Yao, F. Shen, D. Huang, X. Huang, and H.-T. Shen, “Hierarchical co-attention propagation network for zero-shot video object segmentation,” IEEE Transactions on Image Processing., vol. 32, pp. 2348–2359, 2023.
- Y. Yao, Z. Sun, C. Zhang, F. Shen, Q. Wu, J. Zhang, and Z. Tang, “Jo-src: A contrastive approach for combating noisy labels,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5192–5201.
- Z. Sun, Y. Yao, X.-S. Wei, Y. Zhang, F. Shen, J. Wu, J. Zhang, and H. T. Shen, “Webly supervised fine-grained recognition: Benchmark datasets and an approach,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 10 602–10 611.
- X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg, “Matchnet: Unifying feature and metric learning for patch-based matching,” in CVPR, 2015, pp. 3279–3286.
- V. Balntas, E. Riba, D. Ponsa, and K. Mikolajczyk, “Learning local feature descriptors with triplets and shallow convolutional neural networks,” in British Machine Vision Conference, 2016, pp. 1–11.
- Y. Tian, B. Fan, and F. Wu, “L2-net: Deep learning of discriminative patch descriptor in euclidean space,” in CVPR, 2017, pp. 6128–6136.
- Y. Tian, A. Barroso Laguna, T. Ng, V. Balntas, and K. Mikolajczyk, “Hynet: Learning local descriptor with hybrid similarity measure and triplet loss,” in NIPS, 2020, pp. 7401–7412.
- Z. Luo, T. Shen, L. Zhou, S. Zhu, R. Zhang, Y. Yao, T. Fang, and L. Quan, “Geodesc: Learning local descriptors by integrating geometry constraints,” in ECCV, pp. 168–183.
- Y. Tian, X. Yu, B. Fan, F. Wu, H. Heijnen, and V. Balntas, “Sosnet: Second order similarity regularization for local descriptor learning,” in CVPR, 2019, pp. 11 008–11 017.
- X. Liu, C. Meng, F.-P. Tian, and W. Feng, “Dgd-net: Local descriptor guided keypoint detection network,” in IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6.
- Q. Zhou, T. Sattler, and L. Leal-Taixé, “Patch2pix: Epipolar-guided pixel-level correspondences,” in CVPR, 2021, pp. 4667–4676.
- C. Wang, R. Xu, S. Xu, W. Meng, and X. Zhang, “Cndesc: Cross normalization for local descriptors learning,” TMM, vol. 25, pp. 3989–4001, 2023.
- O. Wiles, S. Ehrhardt, and A. Zisserman, “Co-attention for conditioned image matching,” in CVPR, 2021, pp. 15 915–15 924.
- P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in CVPR, 2020, pp. 4937–4946.
- P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “Lightglue: Local feature matching at light speed,” ArXiv, vol. abs/2306.13643, 2023.
- J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in CVPR, 2021, pp. 8918–8927.
- L. H. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu, “Identifying corresponding patches in sar and optical images with a pseudo-siamese cnn,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 784–788, 2018.
- A. Shabanov, S. Gladilin, and E. A. Shvets, “Optical-to-sar image registration using a combination of cnn descriptors and cross-correlation coefficient,” in International Conference on Machine Vision, 2020, pp. 440–449.
- C. Lan, W. Lu, J. Yu, and Q. Xu, “Deep learning algorithm for feature matching of cross modality remote sensing images,” Acta Geodaetica et Cartographica Sinica, vol. 50, no. 2, p. 189, 2021.
- J. Zhang, W. Ma, Y. Wu, and L. Jiao, “Multimodal remote sensing image registration based on image transfer and local features,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 8, pp. 1210–1214, 2019.
- L. Zeng, Y. Du, H. Lin, J. Wang, J. Yin, and J. Yang, “A novel region-based image registration method for multisource remote sensing images via cnn,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 1821–1831, 2021.
- T. Ma, J. Ma, K. Yu, J. Zhang, and W. Fu, “Multispectral remote sensing image matching via image transfer by regularized conditional generative adversarial networks and local feature,” IEEE Geoscience and Remote Sensing Letters, vol. 18, no. 2, pp. 351–355, 2021.
- W.-L. Du, Y. Zhou, J. Zhao, X. Tian, Z. Yang, and F. Bian, “Exploring the potential of unsupervised image synthesis for sar-optical image matching,” IEEE Access, vol. 9, pp. 71 022–71 033, 2021.
- F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in CVPR, 2015, pp. 815–823.
- X. Li, G. Zhang, H. Cui, S. Hou, S. Wang, X. Li, Y. Chen, Z. Li, and L. Zhang, “Mcanet: A joint semantic segmentation framework of optical and sar images for land use classification,” International Journal of Applied Earth Observation and Geoinformation, vol. 106, p. 102638, 2022.
- M. Farhat, H. Chaabouni-Chouayakh, and A. Ben-Hamadou, “Self-supervised endoscopic image key-points matching,” Expert Systems with Applications, vol. 213, p. 118696, 2023.
- A. Menegola, M. Fornaciali, R. Pires, F. V. Bittencourt, S. Avila, and E. Valle, “Knowledge transfer for melanoma screening with deep learning,” in IEEE International Symposium on Biomedical Imaging, 2017, pp. 297–300.
- J. Wang, S. Zhou, J. Wang, and Q. Hou, “Deep ranking model by large adaptive margin learning for person re-identification,” Pattern Recognition, vol. 74, pp. 241–252, 2018.