Unsupervised Visible-Infrared ReID via Pseudo-label Correction and Modality-level Alignment (2404.06683v1)
Abstract: Unsupervised visible-infrared person re-identification (UVI-ReID) has recently gained great attention due to its potential for enhancing human detection in diverse environments without labeling. Previous methods utilize intra-modality clustering and cross-modality feature matching to achieve UVI-ReID. However, there exist two challenges: 1) noisy pseudo labels might be generated in the clustering process, and 2) the cross-modality feature alignment via matching the marginal distribution of visible and infrared modalities may misalign the different identities from two modalities. In this paper, we first conduct a theoretic analysis where an interpretable generalization upper bound is introduced. Based on the analysis, we then propose a novel unsupervised cross-modality person re-identification framework (PRAISE). Specifically, to address the first challenge, we propose a pseudo-label correction strategy that utilizes a Beta Mixture Model to predict the probability of mis-clustering based network's memory effect and rectifies the correspondence by adding a perceptual term to contrastive learning. Next, we introduce a modality-level alignment strategy that generates paired visible-infrared latent features and reduces the modality gap by aligning the labeling function of visible and infrared features to learn identity discriminative and modality-invariant features. Experimental results on two benchmark datasets demonstrate that our method achieves state-of-the-art performance than the unsupervised visible-ReID methods.
- C. Lang, A. Braun, L. Schillingmann, and A. Valada, “Self-supervised multi-object tracking for autonomous driving from consistency across timescales,” IEEE Robotics and Automation Letters, 2023.
- H. Ye, J. Zhao, Y. Zhan, W. Chen, L. He, and H. Zhang, “Person re-identification for robot person following with online continual learning,” arXiv preprint arXiv:2309.11727, 2023.
- Z. Li, H. Shao, L. Niu, and N. Xue, “Progressive learning algorithm for efficient person re-identification,” in 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 16–23.
- Q. Zhang, C. Lai, J. Liu, N. Huang, and J. Han, “Fmcnet: Feature-level modality compensation for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7349–7358.
- H. Wang, J. Shen, Y. Liu, Y. Gao, and E. Gavves, “Nformer: Robust person re-identification with neighbor transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7297–7307.
- J. Xu, R. Zhao, F. Zhu, H. Wang, and W. Ouyang, “Attention-aware compositional network for person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2119–2128.
- N. Pu, Z. Zhong, N. Sebe, and M. S. Lew, “A memorizing and generalizing framework for lifelong person re-identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- X. Yang, P. Zhou, and M. Wang, “Person reidentification via structural deep metric learning,” IEEE transactions on neural networks and learning systems, vol. 30, no. 10, pp. 2987–2998, 2018.
- S. Choi, S. Lee, Y. Kim, T. Kim, and C. Kim, “Hi-cmd: Hierarchical cross-modality disentanglement for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 257–10 266.
- Z. Feng, J. Lai, and X. Xie, “Learning modality-specific representations for visible-infrared person re-identification,” IEEE Transactions on Image Processing, vol. 29, pp. 579–590, 2019.
- Q. Wu, P. Dai, J. Chen, C.-W. Lin, Y. Wu, F. Huang, B. Zhong, and R. Ji, “Discover cross-modality nuances for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4330–4339.
- H. Zheng, X. Zhong, W. Huang, K. Jiang, W. Liu, and Z. Wang, “Visible-infrared person re-identification: A comprehensive survey and a new setting,” Electronics, vol. 11, no. 3, p. 454, 2022.
- H. Park, S. Lee, J. Lee, and B. Ham, “Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12 046–12 055.
- X. Cai, L. Liu, L. Zhu, and H. Zhang, “Dual-modality hard mining triplet-center loss for visible infrared person re-identification,” Knowledge-Based Systems, vol. 215, p. 106772, 2021.
- A. Wu, W.-S. Zheng, H.-X. Yu, S. Gong, and J. Lai, “Rgb-infrared cross-modality person re-identification,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5380–5389.
- M. Ye, X. Lan, Z. Wang, and P. C. Yuen, “Bi-directional center-constrained top-ranking for visible thermal person re-identification,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 407–419, 2019.
- Y. Lu, Y. Wu, B. Liu, T. Zhang, B. Li, Q. Chu, and N. Yu, “Cross-modality person re-identification with shared-specific feature transfer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 379–13 389.
- Z. Wang, Z. Wang, Y. Zheng, Y.-Y. Chuang, and S. Satoh, “Learning to reduce dual-level discrepancy for infrared-visible person re-identification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 618–626.
- G. Wang, T. Zhang, J. Cheng, S. Liu, Y. Yang, and Z. Hou, “Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3623–3632.
- W. Liang, G. Wang, J. Lai, and X. Xie, “Homogeneous-to-heterogeneous: Unsupervised learning for rgb-infrared person re-identification,” IEEE Transactions on Image Processing, vol. 30, pp. 6392–6407, 2021.
- J. Wang, Z. Zhang, M. Chen, Y. Zhang, C. Wang, B. Sheng, Y. Qu, and Y. Xie, “Optimal transport for label-efficient visible-infrared person re-identification,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV. Springer, 2022, pp. 93–109.
- C. Hu and G. H. Lee, “Feature representation learning for unsupervised cross-domain image retrieval,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII. Springer, 2022, pp. 529–544.
- X. Wang, D. Peng, M. Yan, and P. Hu, “Correspondence-free domain alignment for unsupervised cross-domain image retrieval,” Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
- Z. Ma and A. Leijon, “Bayesian estimation of beta mixture models with variational inference,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 11, pp. 2160–2173, 2011.
- M. Ye, J. Shen, D. J. Crandall, L. Shao, and J. Luo, “Dynamic dual-attentive aggregation learning for visible-infrared person re-identification,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16. Springer, 2020, pp. 229–247.
- H. Liu, S. Ma, D. Xia, and S. Li, “Sfanet: A spectrum-aware feature augmentation network for visible-infrared person reidentification,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
- M. Ye, J. Shen, and L. Shao, “Visible-infrared person re-identification via homogeneous augmented tri-modal learning,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 728–739, 2020.
- Y. Li, T. Zhang, and Y. Zhang, “Frequency domain modality-invariant feature learning for visible-infrared person re-identification,” arXiv preprint arXiv:2401.01839, 2024.
- N. Pu, W. Chen, Y. Liu, E. M. Bakker, and M. S. Lew, “Dual gaussian-based variational subspace disentanglement for visible-infrared person re-identification,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2149–2158.
- K. Kansal, A. V. Subramanyam, Z. Wang, and S. Satoh, “Sdl: Spectrum-disentangled representation learning for visible-infrared person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 10, pp. 3422–3432, 2020.
- M. Ye, X. Lan, J. Li, and P. Yuen, “Hierarchical discriminative learning for visible thermal person re-identification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
- Y. Hao, N. Wang, J. Li, and X. Gao, “Hsme: Hypersphere manifold embedding for visible thermal person re-identification,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8385–8392.
- M. Ye, W. Ruan, B. Du, and M. Z. Shou, “Channel augmented joint learning for visible-infrared recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13 567–13 576.
- J. Liu, Y. Sun, F. Zhu, H. Pei, Y. Yang, and W. Li, “Learning memory-augmented unidirectional metrics for cross-modality person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19 366–19 375.
- Y. Gao, T. Liang, Y. Jin, X. Gu, W. Liu, Y. Li, and C. Lang, “Mso: Multi-feature space joint optimization network for rgb-infrared person re-identification,” in Proceedings of the 29th ACM international conference on multimedia, 2021, pp. 5257–5265.
- V. V. Kniaz, V. A. Knyaz, J. Hladuvka, W. G. Kropatsch, and V. Mizginov, “Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
- G.-A. Wang, T. Zhang, Y. Yang, J. Cheng, J. Chang, X. Liang, and Z.-G. Hou, “Cross-modality paired-images generation for rgb-infrared person re-identification,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 12 144–12 151.
- Y. Yang, T. Zhang, J. Cheng, Z. Hou, P. Tiwari, H. M. Pandey et al., “Cross-modality paired-images generation and augmentation for rgb-infrared person re-identification,” Neural Networks, vol. 128, pp. 294–304, 2020.
- B. Yang, M. Ye, J. Chen, and Z. Wu, “Augmented dual-contrastive aggregation learning for unsupervised visible-infrared person re-identification,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 2843–2851.
- Z. Wu and M. Ye, “Unsupervised visible-infrared person re-identification via progressive graph matching and alternate learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9548–9558.
- B. Yang, J. Chen, and M. Ye, “Towards grand unified representation learning for unsupervised visible-infrared person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11 069–11 079.
- D. Cheng, X. Huang, N. Wang, L. He, Z. Li, and X. Gao, “Unsupervised visible-infrared person reid by collaborative learning with neighbor-guided label refinement,” in Proceedings of the 31st ACM International Conference on Multimedia, 2023, pp. 7085–7093.
- B. Yang, J. Chen, C. Chen, and M. Ye, “Dual consistency-constrained learning for unsupervised visible-infrared person re-identification,” IEEE Transactions on Information Forensics and Security, 2023.
- J. Shi, X. Yin, Y. Chen, Y. Zhang, Z. Zhang, Y. Xie, and Y. Qu, “Multi-memory matching for unsupervised visible-infrared person re-identification,” arXiv preprint arXiv:2401.06825, 2024.
- J. Shi, X. Yin, Y. Wang, X. Liu, Y. Xie, and Y. Qu, “Progressive contrastive learning with multi-prototype for unsupervised visible-infrared person re-identification,” arXiv preprint arXiv:2402.19026, 2024.
- D. Fu, D. Chen, H. Yang, J. Bao, L. Yuan, L. Zhang, H. Li, F. Wen, and D. Chen, “Large-scale pre-training for person re-identification with noisy labels,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 2476–2486.
- M. Ye, H. Li, B. Du, J. Shen, L. Shao, and S. C. Hoi, “Collaborative refining for person re-identification with label noise,” IEEE Transactions on Image Processing, vol. 31, pp. 379–391, 2021.
- Y. Cho, W. J. Kim, S. Hong, and S.-E. Yoon, “Part-based pseudo label refinement for unsupervised person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7308–7318.
- Y. Wu, X. Wu, X. Li, and J. Tian, “Mgh: Metadata guided hypergraph modeling for unsupervised person re-identification,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 1571–1580.
- T. Yan, K. Zhu, G. Zhu, M. Tang, J. Wang et al., “Plug-and-play pseudo label correction network for unsupervised person re-identification,” arXiv preprint arXiv:2206.06607, 2022.
- D. Arpit, S. Jastrzkebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio et al., “A closer look at memorization in deep networks,” in International conference on machine learning. PMLR, 2017, pp. 233–242.
- S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” Advances in neural information processing systems, vol. 19, 2006.
- H. Zhao, R. T. Des Combes, K. Zhang, and G. Gordon, “On learning invariant representations for domain adaptation,” in International conference on machine learning. PMLR, 2019, pp. 7523–7532.
- K. Khan, S. U. Rehman, K. Aziz, S. Fong, and S. Sarasvady, “Dbscan: Past, present and future,” in The fifth international conference on the applications of digital information and web technologies (ICADIWT 2014). IEEE, 2014, pp. 232–238.
- Z. Dai, G. Wang, W. Yuan, S. Zhu, and P. Tan, “Cluster contrast for unsupervised person re-identification,” in Proceedings of the Asian Conference on Computer Vision, 2022, pp. 1142–1160.
- J. Li, R. Socher, and S. C. Hoi, “Dividemix: Learning with noisy labels as semi-supervised learning,” arXiv preprint arXiv:2002.07394, 2020.
- E. Arazo, D. Ortego, P. Albert, N. O’Connor, and K. McGuinness, “Unsupervised label noise modeling and loss correction,” in International conference on machine learning. PMLR, 2019, pp. 312–321.
- T. K. Moon, “The expectation-maximization algorithm,” IEEE Signal processing magazine, vol. 13, no. 6, pp. 47–60, 1996.
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
- M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” 2017.
- M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 6, pp. 2872–2893, 2021.
- M. Yang, Z. Huang, P. Hu, T. Li, J. Lv, and X. Peng, “Learning with twin noisy labels for visible-infrared person re-identification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 308–14 317.
- Y. Ge, F. Zhu, D. Chen, R. Zhao et al., “Self-paced contrastive learning with hybrid memory for domain adaptive object re-id,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 309–11 321, 2020.
- Y. Ge, D. Chen, and H. Li, “Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification,” arXiv preprint arXiv:2001.01526, 2020.
- H. Chen, B. Lagadec, and F. Bremond, “Ice: Inter-instance contrastive encoding for unsupervised person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14 960–14 969.
- S. Xuan and S. Zhang, “Intra-inter camera similarity for unsupervised person re-identification,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11 926–11 935.
- Y. Lin, X. Dong, L. Zheng, Y. Yan, and Y. Yang, “A bottom-up clustering approach to unsupervised person re-identification,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 8738–8745.
- X. Hao, S. Zhao, M. Ye, and J. Shen, “Cross-modality person re-identification via modality confusion and center aggregation,” in Proceedings of the IEEE/CVF International conference on computer vision, 2021, pp. 16 403–16 412.
- D. T. Nguyen, H. G. Hong, K. W. Kim, and K. R. Park, “Person recognition system based on a combination of body images from visible light and thermal cameras,” Sensors, vol. 17, no. 3, 2017. [Online]. Available: https://www.mdpi.com/1424-8220/17/3/605
- L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.