Image Re-Identification: Where Self-supervision Meets Vision-Language Learning (2407.20647v1)

Published 30 Jul 2024 in cs.CV

Abstract: Recently, large-scale vision-language pre-trained models like CLIP have shown impressive performance in image re-identification (ReID). In this work, we explore whether self-supervision can aid in the use of CLIP for image ReID tasks. Specifically, we propose SVLL-ReID, the first attempt to integrate self-supervision and pre-trained CLIP via two training stages to facilitate the image ReID. We observe that: 1) incorporating language self-supervision in the first training stage can make the learnable text prompts more distinguishable, and 2) incorporating vision self-supervision in the second training stage can make the image features learned by the image encoder more discriminative. These observations imply that: 1) the text prompt learning in the first stage can benefit from the language self-supervision, and 2) the image feature learning in the second stage can benefit from the vision self-supervision. These benefits jointly facilitate the performance gain of the proposed SVLL-ReID. By conducting experiments on six image ReID benchmark datasets without any concrete text labels, we find that the proposed SVLL-ReID achieves the overall best performances compared with state-of-the-arts. Codes will be publicly available at https://github.com/BinWangGzhu/SVLL-ReID.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (25)

Authors (5)

Bin Wang (750 papers)
Yuying Liang (2 papers)
Lei Cai (17 papers)
Huakun Huang (1 paper)
Huanqiang Zeng (19 papers)

GitHub

GitHub - BinWangGzhu/SVLL-ReID (7 stars)

Image Re-Identification: Where Self-supervision Meets Vision-Language Learning (2407.20647v1)

Related Papers

GitHub