Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Re-Identification: Where Self-supervision Meets Vision-Language Learning (2407.20647v1)

Published 30 Jul 2024 in cs.CV

Abstract: Recently, large-scale vision-language pre-trained models like CLIP have shown impressive performance in image re-identification (ReID). In this work, we explore whether self-supervision can aid in the use of CLIP for image ReID tasks. Specifically, we propose SVLL-ReID, the first attempt to integrate self-supervision and pre-trained CLIP via two training stages to facilitate the image ReID. We observe that: 1) incorporating language self-supervision in the first training stage can make the learnable text prompts more distinguishable, and 2) incorporating vision self-supervision in the second training stage can make the image features learned by the image encoder more discriminative. These observations imply that: 1) the text prompt learning in the first stage can benefit from the language self-supervision, and 2) the image feature learning in the second stage can benefit from the vision self-supervision. These benefits jointly facilitate the performance gain of the proposed SVLL-ReID. By conducting experiments on six image ReID benchmark datasets without any concrete text labels, we find that the proposed SVLL-ReID achieves the overall best performances compared with state-of-the-arts. Codes will be publicly available at https://github.com/BinWangGzhu/SVLL-ReID.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. S. Li, L. Sun, and Q. Li, “Clip-reid: exploiting vision-language model for image re-identification without concrete text labels,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023, pp. 1405–1413.
  2. S. Yang, W. Liu, and et al., “Diverse feature learning network with attention suppression and part level background suppression for person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 1, pp. 283–297, 2023.
  3. Z. Yu, L. Li, and et al., “Pedestrian 3d shape understanding for person re-identification via multi-view learning,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2024.
  4. P. Khorramshahi, V. Shenoy, and R. Chellappa, “Robust and scalable vehicle re-identification via self-supervision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5295–5304.
  5. Z. Chai, Y. Ling, and et al., “Dual-stream transformer with distribution alignment for visible-infrared person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 11, pp. 6764–6776, 2023.
  6. S. He, H. Luo, and et al., “Transreid: Transformer-based object re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 013–15 022.
  7. K. Zhu, H. Guo, and et al., “Aaformer: Auto-aligned transformer for person re-identification,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–11, 2023.
  8. H. Zhu, W. Ke, and et al., “Dual cross-attention learning for fine-grained visual categorization and object re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4692–4702.
  9. T. Wang, H. Liu, and et al., “Pose-guided feature disentangling for occluded person re-identification based on transformer,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 2540–2549.
  10. W. Li, C. Zou, and et al., “Dc-former: Diverse and compact transformer for person re-identification,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023, pp. 1415–1423.
  11. S. Kim, S. Kang, H. Choi, S. S. Kim, and K. Seo, “Keypoint aware robust representation for transformer-based re-identification of occluded person,” IEEE Signal Processing Letters, vol. 30, pp. 65–69, 2023.
  12. Z. Lu, R. Lin, and H. Hu, “Mart: Mask-aware reasoning transformer for vehicle re-identification,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 2, pp. 1994–2009, 2023.
  13. Z. Yu, Z. Huang, J. Pei, L. Tahsin, and D. Sun, “Semantic-oriented feature coupling transformer for vehicle re-identification in intelligent transportation system,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–11, 2023.
  14. F. Shen, Y. Xie, J. Zhu, X. Zhu, and H. Zeng, “Git: Graph interactive transformer for vehicle re-identification,” IEEE Transactions on Image Processing, vol. 32, pp. 1039–1051, 2023.
  15. Z. Gao, P. Chen, and et al., “A semantic perception and cnn-transformer hybrid network for occluded person re-identification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2010–2025, 2023.
  16. A. Radford, J. W. Kim, and et al., “Learning transferable visual models from natural language supervision,” in Proceedings of the ACM International Conference on Machine Learning, 2021, pp. 8748–8763.
  17. H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, “Bag of tricks and a strong baseline for deep person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, p. 1487–1495.
  18. J. Miao, Y. Wu, and et al., “Pose-guided feature alignment for occluded person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 542–551.
  19. Z. Zheng, L. Zheng, and Y. Yang, “Unlabeled samples generated by gan improve the person re-identification baseline in vitro,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 3754–3762.
  20. L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 79–88.
  21. L. Zheng, L. Shen, and et al., “Scalable person re-identification: A benchmark,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015, pp. 1116–1124.
  22. H. Liu, Y. Tian, and et al., “Deep relative distance learning: Tell the difference between similar vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 2167–2175.
  23. X. Liu, W. Liu, T. Mei, and H. Ma, “A deep learning-based approach to progressive vehicle re-identification for urban surveillance,” in Proceedings of the European Conference on Computer Vision, 2016, pp. 869–884.
  24. Y. Lin, C. Liu, Y. Chen, J. Hu, B. Yin, B. Yin, and Z. Wang, “Exploring part-informed visual-language learning for person re-identification,” arXiv preprint arXiv:2308.02738, 2023.
  25. H. Chefer, S. Gur, and L. Wolf, “Transformer interpretability beyond attention visualization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 782–791.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Bin Wang (750 papers)
  2. Yuying Liang (2 papers)
  3. Lei Cai (17 papers)
  4. Huakun Huang (1 paper)
  5. Huanqiang Zeng (19 papers)
Github Logo Streamline Icon: https://streamlinehq.com