Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification (2402.10435v1)
Abstract: Person re-identification (re-ID) continues to pose a significant challenge, particularly in scenarios involving occlusions. Prior approaches aimed at tackling occlusions have predominantly focused on aligning physical body features through the utilization of external semantic cues. However, these methods tend to be intricate and susceptible to noise. To address the aforementioned challenges, we present an innovative end-to-end solution known as the Dynamic Patch-aware Enrichment Transformer (DPEFormer). This model effectively distinguishes human body information from occlusions automatically and dynamically, eliminating the need for external detectors or precise image alignment. Specifically, we introduce a dynamic patch token selection module (DPSM). DPSM utilizes a label-guided proxy token as an intermediary to identify informative occlusion-free tokens. These tokens are then selected for deriving subsequent local part features. To facilitate the seamless integration of global classification features with the finely detailed local features selected by DPSM, we introduce a novel feature blending module (FBM). FBM enhances feature representation through the complementary nature of information and the exploitation of part diversity. Furthermore, to ensure that DPSM and the entire DPEFormer can effectively learn with only identity labels, we also propose a Realistic Occlusion Augmentation (ROA) strategy. This strategy leverages the recent advances in the Segment Anything Model (SAM). As a result, it generates occlusion images that closely resemble real-world occlusions, greatly enhancing the subsequent contrastive learning process. Experiments on occluded and holistic re-ID benchmarks signify a substantial advancement of DPEFormer over existing state-of-the-art approaches. The code will be made publicly available.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Eur. Conf. Comput. Vis., 2018, pp. 480–496.
- J. Dai, P. Zhang, D. Wang, H. Lu, and H. Wang, “Video person re-identification by temporal residual learning,” IEEE T. Image Process., vol. 28, no. 3, pp. 1366–1377, 2019.
- J. Liu, W. Zhuang, Y. Wen, J. Huang, and W. Lin, “Optimizing federated unsupervised person re-identification via camera-aware clustering,” in IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), 2022, pp. 1–6.
- K. Jiang, T. Zhang, X. Liu, B. Qian, Y. Zhang, and F. Wu, “Cross-modality transformer for visible-infrared person re-identification,” in Eur. Conf. Comput. Vis., 2022, pp. 480–496.
- X. Liu, C. Yu, P. Zhang, and H. Lu, “Deeply coupled convolution–transformer with spatial–temporal complementary learning for video-based person re-identification,” IEEE T. Neural Netw. Learn. Syst., pp. 1–11, 2023.
- W. Zhuang, X. Gan, Y. Wen, and S. Zhang, “Optimizing performance of federated person re-identification: Benchmarking and analysis,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 19, pp. 1–18, 2023.
- Y. Dai, X. Li, J. Liu, Z. Tong, and L.-Y. Duan, “Generalizable person re-identification with relevance-aware mixture of experts,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 16 145–16 154.
- B. Hu, J. Liu, and Z.-j. Zha, “Adversarial disentanglement and correlation network for rgb-infrared person re-identification,” in Int. Conf. Multimedia and Expo, 2021, pp. 1–6.
- K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 5693–5703.
- P. Li, Y. Xu, Y. Wei, and Y. Yang, “Self-correction for human parsing,” IEEE T. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 3260–3271, 2020.
- H. Huang, X. Chen, and K. Huang, “Human parsing based alignment with multi-task learning for occluded person re-identification,” in Int. Conf. Multimedia and Expo, 2020, pp. 1–6.
- S. Dou, C. Zhao, X. Jiang, S. Zhang, W.-S. Zheng, and W. Zuo, “Human co-parsing guided alignment for occluded person re-identification,” IEEE T. Image Process., vol. 32, pp. 458–470, 2023.
- V. Somers, C. De Vleeschouwer, and A. Alahi, “Body part-based representation learning for occluded person re-identification,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 1613–1623.
- J. Miao, Y. Wu, P. Liu, Y. Ding, and Y. Yang, “Pose-guided feature alignment for occluded person re-identification,” in Int. Conf. Comput. Vis., 2019, pp. 542–551.
- T. Wang, H. Liu, P. Song, T. Guo, and W. Shi, “Pose-guided feature disentangling for occluded person re-identification based on transformer,” in AAAI Conf. Art. Intell., vol. 36, no. 3, 2022, pp. 2540–2549.
- J. Yang, C. Zhang, Z. Li, Y. Tang, and Z. Wang, “Discriminative feature mining with relation regularization for person re-identification,” Information Processing & Management, vol. 60, no. 3, p. 103295, 2023.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- Y. Yang, J. Yang, J. Yan, S. Liao, D. Yi, and S. Z. Li, “Salient color names for person re-identification,” in Eur. Conf. Comput. Vis. Springer, 2014, pp. 536–551.
- S. Liao, Y. Hu, X. Zhu, and S. Z. Li, “Person re-identification by local maximal occurrence representation and metric learning,” in IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 2197–2206.
- W. Chen, X. Chen, J. Zhang, and K. Huang, “Beyond triplet loss: a deep quadruplet network for person re-identification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 403–412.
- F. Zheng, C. Deng, X. Sun, X. Jiang, X. Guo, Z. Yu, F. Huang, and R. Ji, “Pyramidal person re-identification via multi-loss dynamic training,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 8514–8522.
- G. Wang, Y. Yuan, X. Chen, J. Li, and X. Zhou, “Learning discriminative features with multiple granularities for person re-identification,” in ACM Int. Conf. Multimedia, 2018, pp. 274–282.
- Y. Lin, L. Zheng, Z. Zheng, Y. Wu, Z. Hu, C. Yan, and Y. Yang, “Improving person re-identification by attribute and identity learning,” Pattern Recognit., vol. 95, pp. 151–161, 2019.
- C.-P. Tay, S. Roy, and K.-H. Yap, “Aanet: Attribute attention network for person re-identifications,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 7134–7143.
- B. Chen, W. Deng, and J. Hu, “Mixed high-order attention network for person re-identification,” in Int. Conf. Comput. Vis., 2019, pp. 371–381.
- X. Zhang, M. Hou, X. Deng, and Z. Feng, “Multi-cascaded attention and overlapping part features network for person re-identification,” Signal, Image and Video Processing, vol. 16, no. 6, pp. 1525–1532, 2022.
- K. Zheng, W. Liu, L. He, T. Mei, J. Luo, and Z.-J. Zha, “Group-aware label transfer for domain adaptive person re-identification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 5306–5315.
- Y. Dai, J. Liu, Y. Sun, Z. Tong, C. Zhang, and L.-Y. Duan, “Idm: An intermediate domain module for domain adaptive person re-id,” in Int. Conf. Comput. Vis., 2021, pp. 11 864–11 874.
- Q. He, Z. Wang, Z. Zheng, and H. Hu, “Spatial and temporal dual-attention for unsupervised person re-identification,” IEEE T. Intell. Transp. Syst., pp. 1–13, 2023.
- M. Xu, H. Guo, Y. Jia, Z. Dai, and J. Wang, “Pseudo label rectification with joint camera shift adaptation and outlier progressive recycling for unsupervised person re-identification,” IEEE T. Intell. Transp. Syst., vol. 24, no. 3, pp. 3395–3406, 2023.
- L. He, Y. Wang, W. Liu, H. Zhao, Z. Sun, and J. Feng, “Foreground-aware pyramid reconstruction for alignment-free occluded person re-identification,” in Int. Conf. Comput. Vis., 2019, pp. 8450–8459.
- S. Gao, J. Wang, H. Lu, and Z. Liu, “Pose-guided visible part matching for occluded person reid,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 11 744–11 752.
- G. Wang, S. Yang, H. Liu, Z. Wang, Y. Yang, S. Wang, G. Yu, E. Zhou, and J. Sun, “High-order information matters: Learning relation and topology for occluded person re-identification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 6449–6458.
- G. Yan, Z. Wang, S. Geng, Y. Yu, and Y. Guo, “Part-based representation enhancement for occluded person re-identification,” IEEE T. Circuit Syst. Video Technol., vol. 33, no. 8, pp. 4217–4231, 2023.
- M. Jia, X. Cheng, S. Lu, and J. Zhang, “Learning disentangled representation implicitly via transformer for occluded person re-identification,” IEEE T. Multimedia, vol. 25, pp. 1294–1305, 2023.
- Z. Wang, F. Zhu, S. Tang, R. Zhao, L. He, and J. Song, “Feature erasing and diffusion network for occluded person re-identification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 4754–4763.
- Y. Li, J. He, T. Zhang, X. Liu, Y. Zhang, and F. Wu, “Diverse part discovery: Occluded person re-identification with part-aware transformer,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 2898–2907.
- Y. Ge, F. Zhu, D. Chen, R. Zhao, and H. Li, “Self-paced contrastive learning with hybrid memory for domain adaptive object re-id,” Adv. Neural Inform. Process. Syst., 2020.
- Z. Wang, C. Li, A. Zheng, R. He, and J. Tang, “Interact, embed, and enlarge: boosting modality-specific representations for multi-modal person re-identification,” in AAAI Conf. Art. Intell., vol. 36, no. 3, 2022, pp. 2633–2641.
- Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” in AAAI Conf. Art. Intell., vol. 34, no. 07, 2020, pp. 13 001–13 008.
- P. Chen, W. Liu, P. Dai, J. Liu, Q. Ye, M. Xu, Q. Chen, and R. Ji, “Occlude them all: Occlusion-aware attention network for occluded person re-id,” in Int. Conf. Comput. Vis., 2021, pp. 11 833–11 842.
- A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
- J. Zhuo, Z. Chen, J. Lai, and G. Wang, “Occluded person re-identification,” in Int. Conf. Multimedia and Expo. IEEE, 2018, pp. 1–6.
- L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Int. Conf. Comput. Vis., 2015, pp. 1116–1124.
- Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in Int. Conf. Comput. Vis., 2017, pp. 3754–3762.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Adv. Neural Inform. Process. Syst., vol. 32, 2019.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 770–778.
- L. Zhao, X. Li, Y. Zhuang, and J. Wang, “Deeply-learned part-aligned representations for person re-identification,” in Int. Conf. Comput. Vis., 2017, pp. 3219–3228.
- W. Li, X. Zhu, and S. Gong, “Harmonious attention network for person re-identification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 2285–2294.
- Y. Suh, J. Wang, S. Tang, T. Mei, and K. M. Lee, “Part-aligned bilinear representations for person re-identification,” in Eur. Conf. Comput. Vis., 2018, pp. 402–419.
- Y. Ge, Z. Li, H. Zhao, G. Yin, S. Yi, X. Wang, and H. Li, “Fd-gan: Pose-guided feature distilling gan for robust person re-identification,” in Adv. Neural Inform. Process. Syst., 2018, pp. 1229–1240.
- L. He, J. Liang, H. Li, and Z. Sun, “Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 7073–7082.
- L. He, Z. Sun, Y. Zhu, and Y. Wang, “Recognizing partial biometric patterns,” arXiv preprint arXiv:1810.07399, 2018.
- H. Huang, D. Li, Z. Zhang, X. Chen, and K. Huang, “Adversarially occluded samples for person re-identification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 5098–5107.
- M. Jia, X. Cheng, Y. Zhai, S. Lu, S. Ma, Y. Tian, and J. Zhang, “Matching on sets: Conquer occluded person re-identification without alignment,” in AAAI Conf. Art. Intell., 2021, pp. 1673–1681.
- M. Tu, K. Zhu, H. Guo, Q. Miao, C. Zhao, G. Zhu, H. Qiao, G. Huang, M. Tang, and J. Wang, “Multi-granularity mutual learning network for object re-identification,” IEEE T. Intell. Transp. Syst., vol. 23, no. 9, pp. 15 178–15 189, 2022.
- M. S. Sarfraz, A. Schumann, A. Eberle, and R. Stiefelhagen, “A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 420–429.
- F. Liu and L. Zhang, “View confusion feature learning for person re-identification,” in Int. Conf. Comput. Vis., 2019, pp. 6639–6648.
- H. Sun, Z. Chen, S. Yan, and L. Xu, “Mvp matching: A maximum-value perfect matching for mining hard samples, with application to person re-identification,” in Int. Conf. Comput. Vis., 2019, pp. 6737–6747.
- C. Luo, Y. Chen, N. Wang, and Z. Zhang, “Spectral feature transformation for person re-identification,” in Int. Conf. Comput. Vis., 2019, pp. 4976–4985.
- Y. Sun, C. Cheng, Y. Zhang, C. Zhang, L. Zheng, Z. Wang, and Y. Wei, “Circle loss: A unified perspective of pair similarity optimization,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 6398–6407.
- Xin Zhang (904 papers)
- Keren Fu (22 papers)
- Qijun Zhao (46 papers)