Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Versatile Framework for Multi-scene Person Re-identification (2403.11121v1)

Published 17 Mar 2024 in cs.CV

Abstract: Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges. To our best knowledge, there is no versatile ReID model that can handle various ReID challenges at the same time. This work contributes to the first attempt at learning a versatile ReID model to solve such a problem. Our main idea is to form a two-stage prompt-based twin modeling framework called VersReID. Our VersReID firstly leverages the scene label to train a ReID Bank that contains abundant knowledge for handling various scenes, where several groups of scene-specific prompts are used to encode different scene-specific knowledge. In the second stage, we distill a V-Branch model with versatile prompts from the ReID Bank for adaptively solving the ReID of different scenes, eliminating the demand for scene labels during the inference stage. To facilitate training VersReID, we further introduce the multi-scene properties into self-supervised learning of ReID via a multi-scene prioris data augmentation (MPDA) strategy. Through extensive experiments, we demonstrate the success of learning an effective and versatile ReID model for handling ReID tasks under multi-scene conditions without manual assignment of scene labels in the inference stage, including general, low-resolution, clothing change, occlusion, and cross-modality scenes. Codes and models are available at https://github.com/iSEE-Laboratory/VersReID.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. J. Baxter, “A bayesian/information theoretic model of learning to learn via multiple task sampling,” Machine Learning, vol. 28, pp. 7–39, 1997.
  2. H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, “Bag of tricks and a strong baseline for deep person re-identification,” in CVPRW, 2019.
  3. Q. Yang, A. Wu, and W.-S. Zheng, “Person re-identification by contour sketch under moderate clothing change,” IEEE TPAMI, vol. 43, no. 6, pp. 2029–2046, 2021.
  4. Y. Huang, J. Xu, Q. Wu, Y. Zhong, P. Zhang, and Z. Zhang, “Beyond scalar neuron: Adopting vector-neuron capsules for long-term person re-identification,” IEEE TCSVT, vol. 30, no. 10, pp. 3459–3471, 2020.
  5. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021.
  6. G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  7. W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in CVPR, 2019.
  8. X. Chen, S. Xie, and K. He, “An empirical study of training self-supervised vision transformers,” in ICCV, 2021.
  9. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in ICCV, 2021.
  10. J. Jiao, W.-S. Zheng, A. Wu, X. Zhu, and S. Gong, “Deep low-resolution person re-identification,” in AAAI, 2018.
  11. J. Miao, Y. Wu, P. Liu, Y. Ding, and Y. Yang, “Pose-guided feature alignment for occluded person re-identification,” in ICCV, 2019.
  12. A. Wu, W.-S. Zheng, S. Gong, and J. Lai, “Rgb-ir person re-identification by cross-modality similarity preservation,” IJCV, vol. 128, no. 6, pp. 1765–1785, 2020.
  13. L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person re-identification,” in CVPR, 2018.
  14. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
  15. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2020.
  16. H. Luo, P. Wang, Y. Xu, F. Ding, Y. Zhou, F. Wang, H. Li, and R. Jin, “Self-supervised pre-training for transformer-based person re-identification,” arXiv preprint arXiv:2111.12084, 2021.
  17. I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” arXiv preprint arXiv:1711.05101, 2017.
  18. ——, “Sgdr: Stochastic gradient descent with warm restarts,” in ICLR, 2016.
  19. I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in ICML, 2013.
Citations (2)

Summary

We haven't generated a summary for this paper yet.