Generalizable Metric Network for Cross-domain Person Re-identification (2306.11991v2)
Abstract: Person Re-identification (Re-ID) is a crucial technique for public security and has made significant progress in supervised settings. However, the cross-domain (i.e., domain generalization) scene presents a challenge in Re-ID tasks due to unseen test domains and domain-shift between the training and test sets. To tackle this challenge, most existing methods aim to learn domain-invariant or robust features for all domains. In this paper, we observe that the data-distribution gap between the training and test sets is smaller in the sample-pair space than in the sample-instance space. Based on this observation, we propose a Generalizable Metric Network (GMN) to further explore sample similarity in the sample-pair space. Specifically, we add a Metric Network (M-Net) after the main network and train it on positive and negative sample-pair features, which is then employed during the test stage. Additionally, we introduce the Dropout-based Perturbation (DP) module to enhance the generalization capability of the metric network by enriching the sample-pair diversity. Moreover, we develop a Pair-Identity Center (PIC) loss to enhance the model's discrimination by ensuring that sample-pair features with the same pair-identity are consistent. We validate the effectiveness of our proposed method through a lot of experiments on multiple benchmark datasets and confirm the value of each module in our GMN.
- L. Qi, L. Wang, J. Huo, Y. Shi, X. Geng, and Y. Gao, “Adversarial camera alignment network for unsupervised cross-camera person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 32, no. 5, pp. 2921–2936, 2021.
- L. Qi, L. Wang, Y. Shi, and X. Geng, “A novel mix-normalization method for generalizable multi-source person re-identification,” IEEE Transactions on Multimedia (TMM), 2022.
- L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identification: Past, present and future,” arXiv preprint arXiv:1610.02984, 2016.
- M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,” arXiv preprint arXiv:2001.04193, 2020.
- Q. Leng, M. Ye, and Q. Tian, “A survey of open-world person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 30, no. 4, pp. 1092–1108, 2020.
- L. Qi, L. Wang, J. Huo, Y. Shi, and Y. Gao, “Progressive cross-camera soft-label learning for semi-supervised person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 30, no. 9, pp. 2815–2829, 2020.
- M. Li, X. Zhu, and S. Gong, “Unsupervised tracklet person re-identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 42, no. 7, pp. 1770–1782, 2020.
- P. Chen, W. Liu, P. Dai, J. Liu, Q. Ye, M. Xu, Q. Chen, and R. Ji, “Occlude them all: Occlusion-aware attention network for occluded person re-id,” in International Conference on Computer Vision (ICCV), 2021, pp. 11 833–11 842.
- F. Ma, X. Jing, X. Zhu, Z. Tang, and Z. Peng, “True-color and grayscale video person re-identification,” IEEE Transactions on Information Forensics and Security (TIFS), vol. 15, pp. 115–129, 2020.
- L. Van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research (JMLR), vol. 9, no. 11, pp. 2579–2605, 2008.
- C. Zhao, X. Lv, Z. Zhang, W. Zuo, J. Wu, and D. Miao, “Deep fusion feature representation learning with hard mining center-triplet loss for person re-identification,” IEEE Transactions on Multimedia (TMM), vol. 22, no. 12, pp. 3180–3195, 2020.
- L. Wei, S. Zhang, H. Yao, W. Gao, and Q. Tian, “GLAD: global-local-alignment descriptor for scalable person re-identification,” IEEE Transactions on Multimedia (TMM), vol. 21, no. 4, pp. 986–999, 2019.
- A. Wu, W. Zheng, X. Guo, and J. Lai, “Distilled person re-identification: Towards a more scalable system,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1187–1196.
- W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2285–2294.
- Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, and J. Kautz, “Joint discriminative and generative learning for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2138–2147.
- L. Qi, L. Wang, J. Huo, Y. Shi, and Y. Gao, “Greyreid: A novel two-stream deep framework with rgb-grey information for person re-identification,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 17, no. 1, pp. 27:1–27:22, 2021.
- F. Yang, K. Yan, S. Lu, H. Jia, D. Xie, Z. Yu, X. Guo, F. Huang, and W. Gao, “Part-aware progressive unsupervised domain adaptation for person re-identification,” IEEE Transactions on Multimedia (TMM), vol. 23, pp. 1681–1695, 2021.
- Y. Zhai, S. Lu, Q. Ye, X. Shan, J. Chen, R. Ji, and Y. Tian, “Ad-cluster: Augmented discriminative clustering for domain adaptive person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9018–9027.
- A. Wu, W. Zheng, and J. Lai, “Unsupervised person re-identification by camera-aware similarity consistency learning,” in International Conference on Computer Vision (ICCV), 2019, pp. 6921–6930.
- G. Chen, Y. Lu, J. Lu, and J. Zhou, “Deep credible metric learning for unsupervised domain adaptation person re-identification,” in European Conference on Computer Vision (ECCV), 2020, pp. 643–659.
- L. Qi, L. Wang, J. Huo, L. Zhou, Y. Shi, and Y. Gao, “A novel unsupervised camera-aware domain adaptation framework for person re-identification,” in International Conference on Computer Vision (ICCV), 2019, pp. 8079–8088.
- H. Li, Y. Chen, D. Tao, Z. Yu, and G. Qi, “Attribute-aligned domain-invariant feature learning for unsupervised domain adaptation person re-identification,” IEEE Transactions on Information Forensics and Security (TIFS), vol. 16, pp. 1480–1494, 2021.
- A. Khatun, S. Denman, S. Sridharan, and C. Fookes, “End-to-end domain adaptive attention network for cross-domain person re-identification,” IEEE Transactions on Information Forensics and Security (TIFS), vol. 16, pp. 3803–3813, 2021.
- Q. Lin, Y. Liu, W. Wen, Z. Tao, C. Ouyang, and Y. Wan, “Ensemble making few-shot learning stronger,” Data Intelligence (DI), vol. 4, no. 3, pp. 529–551, 2022.
- K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” arXiv preprint arXiv:2103.02503, 2021.
- S. Liao and L. Shao, “Interpretable and generalizable person re-identification with query-adaptive convolution and temporal lifting,” in European Conference on Computer Vision (ECCV), 2020, pp. 456–474.
- Y. Zhao, Z. Zhong, F. Yang, Z. Luo, Y. Lin, S. Li, and N. Sebe, “Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6277–6286.
- Y. Dai, X. Li, J. Liu, Z. Tong, and L. Duan, “Generalizable person re-identification with relevance-aware mixture of experts,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16 145–16 154.
- B. Xu, J. Liang, L. He, and Z. Sun, “Mimic embedding via adaptive aggregation: Learning generalizable person re-identification,” in European Conference on Computer Vision (ECCV), 2022, pp. 372–388.
- P. Zhang, H. Dou, Y. Yu, and X. Li, “Adaptive cross-domain learning for generalizable person re-identification,” in European Conference on Computer Vision (ECCV), 2022, pp. 215–232.
- L. Qi, J. Shen, J. Liu, Y. Shi, and X. Geng, “Label distribution learning for generalizable multi-source person re-identification,” IEEE Transactions on Information Forensics and Security (TIFS), vol. 17, pp. 3139–3150, 2022.
- L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in International Conference on Computer Vision (ICCV), 2015, pp. 1116–1124.
- L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 79–88.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “Imagenet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.
- G. Zhang, Y. Ge, Z. Dong, H. Wang, Y. Zheng, and S. Chen, “Deep high-resolution representation learning for cross-resolution person re-identification,” IEEE Transactions on Image Processing (TIP), vol. 30, pp. 8913–8925, 2021.
- G. Zhang, Z. Luo, Y. Chen, Y. Zheng, and W. Lin, “Illumination unification for person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2022.
- C. Eom and B. Ham, “Learning disentangled representation for robust person re-identification,” in Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 5298–5309.
- J. Jia, Q. Ruan, and T. M. Hospedales, “Frustratingly easy person re-identification: Generalizing person re-id in practice,” in British Machine Vision Conference (BMVC), 2019, p. 117.
- X. Jin, C. Lan, W. Zeng, Z. Chen, and L. Zhang, “Style normalization and restitution for generalizable person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3140–3149.
- S. Choi, T. Kim, M. Jeong, H. Park, and C. Kim, “Meta batch-instance normalization for generalizable person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 3425–3435.
- C. Lin, Y. Cheng, and Y. F. Wang, “Domain generalized person re-identification via cross-domain episodic learning,” in 25th International Conference on Pattern Recognition (ICPR), 2020, pp. 6758–6763.
- J. Song, Y. Yang, Y. Song, T. Xiang, and T. M. Hospedales, “Generalizable person re-identification by domain-invariant mapping network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 719–728.
- P. Chen, P. Dai, J. Liu, F. Zheng, M. Xu, Q. Tian, and R. Ji, “Dual distribution alignment network for generalizable person re-identification,” in Association for the Advancement of Artificial Intelligence (AAAI), 2021, pp. 1054–1062.
- C. Luo, C. Song, and Z. Zhang, “Generalizing person re-identification by camera-aware invariance learning and cross-domain mixup,” in European Conference on Computer Vision (ECCV), 2020, pp. 224–241.
- Y. Yuan, W. Chen, T. Chen, Y. Yang, Z. Ren, Z. Wang, and G. Hua, “Calibrated domain-invariant learning for highly generalizable large scale re-identification,” in IEEE Winter Conference on Applications of Computer Vision(WACV), 2020, pp. 3578–3587.
- Z. Zhuang, L. Wei, L. Xie, T. Zhang, H. Zhang, H. Wu, H. Ai, and Q. Tian, “Rethinking the distribution gap of person re-identification with camera-based batch normalization,” in European Conference on Computer Vision (ECCV), 2020, pp. 140–157.
- H. Ni, J. Song, X. Luo, F. Zheng, W. Li, and H. T. Shen, “Meta distribution alignment for generalizable person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 2487–2496.
- H. Ni, Y. Li, L. Gao, H. T. Shen, and J. Song, “Part-aware transformer for generalizable person re-identification,” in IEEE International Conference on Computer Vision (ICCV), 2023, pp. 11 280–11 289.
- B. Jiao, L. Liu, L. Gao, G. Lin, L. Yang, S. Zhang, P. Wang, and Y. Zhang, “Dynamically transformed instance normalization network for generalizable person re-identification,” in European Conference on Computer Vision (ECCV), 2022, pp. 285–301.
- N. Pu, W. Chen, Y. Liu, E. M. Bakker, and M. S. Lew, “Lifelong person re-identification via adaptive knowledge accumulation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 7901–7910.
- N. Pu, Z. Zhong, N. Sebe, and M. S. Lew, “A memorizing and generalizing framework for lifelong person re-identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
- Q. Zheng, H. Wen, M. Wang, G. Qi, and C. Bai, “Faster zero-shot multi-modal entity linking via visual-linguistic representation,” Data Intelligence (DI), vol. 4, no. 3, pp. 493–508, 2022.
- H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, “Reducing domain gap by reducing style bias,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8690–8699.
- Y. Wang, L. Qi, Y. Shi, and Y. Gao, “Feature-based style randomization for domain generalization,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 32, no. 8, pp. 5495–5509, 2022.
- X. Yue, Y. Zhang, S. Zhao, A. L. Sangiovanni-Vincentelli, K. Keutzer, and B. Gong, “Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data,” in International Conference on Computer Vision (ICCV), 2019, pp. 2100–2110.
- F. M. Carlucci, A. D’Innocente, S. Bucci, B. Caputo, and T. Tommasi, “Domain generalization by solving jigsaw puzzles,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2229–2238.
- Y. Balaji, S. Sankaranarayanan, and R. Chellappa, “Metareg: Towards domain generalization using meta-regularization,” in Advances in Neural Information Processing Systems (NeurIPS), 2018, pp. 1006–1016.
- D. Li, J. Zhang, Y. Yang, C. Liu, Y. Song, and T. M. Hospedales, “Episodic training for domain generalization,” in International Conference on Computer Vision (ICCV), 2019, pp. 1446–1455.
- Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao, “Deep domain generalization via conditional invariant adversarial networks,” in European Conference on Computer Vision (ECCV), 2018, pp. 647–663.
- J. Zhang, L. Qi, Y. Shi, and Y. Gao, “Generalizable model-agnostic semantic segmentation via target-specific normalization,” Pattern Recognition (PR), vol. 122, p. 108292, 2022.
- S. Lingwal, K. K. Bhatia, and M. Singh, “Semantic segmentation of landcover for cropland mapping and area estimation using machine learning techniques,” Data Intelligence (DI), vol. 5, no. 2, pp. 370–387, 2023.
- S. Zhao, M. Gong, T. Liu, H. Fu, and D. Tao, “Domain generalization via entropy regularization,” in Advances in Neural Information Processing Systems (NeurIPS), 2020.
- K. Muandet, D. Balduzzi, and B. Schölkopf, “Domain generalization via invariant feature representation,” in International Conference on Machine Learning (ICML), 2013, pp. 10–18.
- M. M. Rahman, C. Fookes, M. Baktashmotlagh, and S. Sridharan, “Correlation-aware adversarial domain adaptation and generalization,” Pattern Recognition (PR), vol. 100, p. 107124, 2020.
- H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 5400–5409.
- R. Gong, W. Li, Y. Chen, and L. V. Gool, “DLOW: domain flow for adaptation and generalization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2477–2486.
- M. M. Rahman, C. Fookes, M. Baktashmotlagh, and S. Sridharan, “Multi-component image translation for deep domain generalization,” in IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019, pp. 579–588.
- J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in International Conference on Computer Vision (ICCV), 2017, pp. 2242–2251.
- R. Aversa, P. Coronica, C. De Nobili, and S. Cozzini, “Deep learning, feature learning, and clustering analysis for sem image classification,” Data Intelligence (DI), vol. 2, no. 4, pp. 513–528, 2020.
- A. Subramaniam, M. Chatterjee, and A. Mittal, “Deep neural networks with inexact matching for person re-identification,” Advances in Neural Information Processing Systems (NeurIPS), vol. 29, 2016.
- Y. Zhang, X. Li, L. Zhao, and Z. Zhang, “Semantics-aware deep correspondence structure learning for robust person re-identification.” in International Joint Conference on Artificial Intelligence (IJCAI), 2016, pp. 3545–3551.
- W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 152–159.
- E. Ahmed, M. Jones, and T. K. Marks, “An improved deep learning architecture for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3908–3916.
- Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in European Conference on Computer Vision (ECCV), 2016, pp. 499–515.
- D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2016.
- H. Luo, W. Jiang, Y. Gu, F. Liu, X. Liao, S. Lai, and J. Gu, “A strong baseline and batch normalization neck for deep person re-identification,” IEEE Transactions on Multimedia (TMM), vol. 22, no. 10, pp. 2597–2609, 2020.
- Y. Fu, Y. Wei, G. Wang, Y. Zhou, H. Shi, and T. S. Huang, “Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification,” in International Conference on Computer Vision (ICCV), 2019, pp. 6111–6120.
- I. Albuquerque, J. Monteiro, M. Darvishi, T. H. Falk, and I. Mitliagkas, “Generalizing to unseen domains via distribution matching,” arXiv preprint arXiv:1911.00804, 2019.
- J. Wang, C. Lan, C. Liu, Y. Ouyang, and T. Qin, “Generalizing to unseen domains: A survey on domain generalization,” in International Joint Conference on Artificial Intelligence, (IJCAI), 2021, pp. 4627–4635.
- Z. Zheng, L. Zheng, and Y. Yang, “A discriminatively learned cnn embedding for person reidentification,” ACM transactions on multimedia computing, communications, and applications (TOMM), vol. 14, no. 1, pp. 1–20, 2017.
- H. Chen, Y. Wang, Y. Shi, K. Yan, M. Geng, Y. Tian, and T. Xiang, “Deep transfer learning for person re-identification,” in IEEE Fourth International Conference on Multimedia Big Data (BigMM), 2018, pp. 1–5.
- R. P. Duin and z. Pekalska, “The dissimilarity space: Bridging structural and statistical pattern recognition,” Pattern Recognition Letters (PRL), vol. 33, no. 7, pp. 826–832, 2012.
- G. Yang, J. Liu, J. Xu, and X. Li, “Dissimilarity representation learning for generalized zero-shot recognition,” in ACM international conference on Multimedia (MM), 2018, pp. 2032–2039.
- L. Wu, C. Shen, and A. v. d. Hengel, “Personnet: Person re-identification with deep convolutional neural networks,” arXiv preprint arXiv:1601.07255, 2016.
- L. Wu, Y. Wang, J. Gao, and X. Li, “Where-and-when to look: Deep siamese attention networks for video-based person re-identification,” IEEE Transactions on Multimedia (TMM), vol. 21, no. 6, pp. 1412–1424, 2018.
- L. Wu, R. Hong, Y. Wang, and M. Wang, “Cross-entropy adversarial view adaptation for person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 30, no. 7, pp. 2081–2092, 2019.
- S. Hou and Z. Wang, “Weighted channel dropout for regularization of deep convolutional neural network,” in AAAI Conference on Artificial Intelligence (AAAI), vol. 33, no. 01, 2019, pp. 8425–8432.
- W. Tan, C. Ding, P. Wang, M. Gong, and K. Jia, “Style interleaved learning for generalizable person re-identification,” IEEE Transactions on Multimedia (TMM), 2023.
- Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking person re-identification with k-reciprocal encoding,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3652–3661.
- T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang, “End-to-end deep learning for person search,” arXiv preprint arXiv:1604.01850, 2016.
- X. Pan, P. Luo, J. Shi, and X. Tang, “Two at once: Enhancing learning and generalization capacities via ibn-net,” in European Conference on Computer Vision (ECCV), 2018, pp. 484–500.
- E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “Autoaugment: Learning augmentation policies from data,” arXiv preprint arXiv:1805.09501, 2018.
- W.-G. Chang, T. You, S. Seo, S. Kwak, and B. Han, “Domain-specific batch normalization for unsupervised domain adaptation,” in IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7354–7362.
- M. Hirzer, C. Beleznai, P. M. Roth, and H. Bischof, “Person re-identification by descriptive and discriminative classification,” in Scandinavian Conference on Image Analysis (SCIA), 2011, pp. 91–102.
- C. C. Loy, T. Xiang, and S. Gong, “Time-delayed correlation analysis for multi-camera activity understanding,” International Journal of Computer Vision (IJCV), vol. 90, pp. 106–129, 2010.
- D. Gray and H. Tao, “Viewpoint invariant pedestrian recognition with an ensemble of localized features,” in European Conference on Computer Vision (ECCV), 2008, pp. 262–275.
- D. Fu, D. Chen, J. Bao, H. Yang, L. Yuan, L. Zhang, H. Li, and D. Chen, “Unsupervised pre-training for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14 750–14 759.
- S. Yang, Y. Zhou, Z. Zheng, Y. Wang, L. Zhu, and Y. Wu, “Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark,” in ACM International Conference on Multimedia (MM), 2023, pp. 4492–4501.
- X. Liu, H. Zhao, M. Tian, L. Sheng, J. Shao, S. Yi, J. Yan, and X. Wang, “Hydraplus-net: Attentive deep features for pedestrian analysis,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 350–359.