Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions (2306.07520v4)

Published 13 Jun 2023 in cs.CV

Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Our instruct-ReID is a more general ReID setting, where existing 6 ReID tasks can be viewed as special cases by designing different instructions. We propose a large-scale OmniReID benchmark and an adaptive triplet loss as a baseline method to facilitate research in this new setting. Experimental results show that the proposed multi-purpose ReID model, trained on our OmniReID benchmark without fine-tuning, can improve +0.5%, +0.6%, +7.7% mAP on Market1501, MSMT17, CUHK03 for traditional ReID, +6.4%, +7.1%, +11.2% mAP on PRCC, VC-Clothes, LTCC for clothes-changing ReID, +11.7% mAP on COCAS+ real2 for clothes template based clothes-changing ReID when using only RGB images, +24.9% mAP on COCAS+ real2 for our newly defined language-instructed ReID, +4.3% on LLCM for visible-infrared ReID, +2.6% on CUHK-PEDES for text-to-image ReID. The datasets, the model, and code will be available at https://github.com/hwz-zju/Instruct-ReID.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Self-supervised multimodal versatile networks. NeurIPS, 2020.
  2. Flamingo: a visual language model for few-shot learning. NeurIPS, 2022.
  3. Rasa: Relation and sensitivity aware representation learning for text-based person search. arXiv preprint arXiv:2305.13653, 2023.
  4. Improving deep visual representation for person re-identification by global and local image-language association. In ECCV, 2018.
  5. Beyond appearance: A semantic controllable self-supervised learning framework for human-centric visual tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15050–15061, 2023.
  6. Person re-identification by camera correlation aware feature augmentation. TPAMI, 2017.
  7. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  8. Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
  9. Bridgeformer: Bridging video-text retrieval with multiple choice questions. arXiv preprint arXiv:2201.04850, 2022.
  10. Person reidentification using spatiotemporal appearance. In CVPR, 2006.
  11. Coot: Cooperative hierarchical transformer for video-text representation learning. NeurIPS, 2020.
  12. Clothes-changing person re-identification with rgb modality only. In CVPR, 2022.
  13. Text-based person search with limited data. arXiv preprint arXiv:2110.10807, 2021.
  14. Transreid: Transformer-based object re-identification. In CVPR, 2021.
  15. Fine-grained shape-appearance mutual learning for cloth-changing person re-identification. In CVPR, 2021.
  16. Interaction-and-aggregation network for person re-identification. In CVPR, 2019.
  17. Language is not all you need: Aligning perception with language models. arXiv preprint arXiv:2302.14045, 2023.
  18. Clothing status awareness for long-term person re-identification. In CVPR, 2021.
  19. Semantics-aligned representation learning for person re-identification. In AAAI, 2020.
  20. Cloth-changing person re-identification from a single image with gait prediction and regularization. In CVPR, 2022.
  21. Less is more: Clipbert for video-and-language learning via sparse sampling. In CVPR, 2021.
  22. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021.
  23. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  24. Hero: Hierarchical encoder for video+ language omni-representation pre-training. arXiv preprint arXiv:2005.00200, 2020a.
  25. Self-correction for human parsing. TPAMI, 2020b.
  26. Person search with natural language description. In CVPR, 2017.
  27. Learning semantic-aligned feature representation for text-based person search. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2724–2728. IEEE, 2022a.
  28. Cocas+: Large-scale clothes-changing person re-identification with clothes templates. TCSVT, 2022b.
  29. Harmonious attention network for person re-identification. In CVPR, 2018.
  30. Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, 2014.
  31. Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. arXiv preprint arXiv:2301.06267, 2023.
  32. Learning memory-augmented unidirectional metrics for cross-modality person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19366–19375, 2022.
  33. Univl: A unified video and language pre-training model for multimodal understanding and generation. arXiv preprint arXiv:2002.06353, 2020.
  34. Bence Nanay. Multimodal mental imagery. Cortex, 2018.
  35. OpenAI. Chatgpt. Available at https://openai.com/blog/chatgpt/, 2023.
  36. Training language models to follow instructions with human feedback. NeurIPS, 2022.
  37. Long-term cloth-changing person re-identification. In ACCV, 2020.
  38. Learning transferable visual models from natural language supervision. In ICML, 2021.
  39. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.
  40. Unified pre-training with pseudo texts for text-to-image person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11174–11184, 2023.
  41. Semantic-guided pixel sampling for cloth-changing person re-identification. SPL, 2021a.
  42. Large-scale spatio-temporal person re-identification: Algorithms and benchmark. IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4390–4403, 2021b.
  43. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV, 2018.
  44. Humanbench: Towards general human-centric perception with projector assisted pretraining. arXiv preprint arXiv:2303.05675, 2023.
  45. Training data-efficient image transformers distillation through attention. In International Conference on Machine Learning, pages 10347–10357, 2021.
  46. When person re-identification meets changing clothes. In CVPR Workshops, 2020.
  47. Person transfer gan to bridge domain gap for person re-identification. In CVPR, 2018.
  48. Vlm: Task-agnostic video-language model pre-training for video understanding. arXiv preprint arXiv:2105.09996, 2021a.
  49. Videoclip: Contrastive pre-training for zero-shot video-text understanding. arXiv preprint arXiv:2109.14084, 2021b.
  50. Augmented dual-contrastive aggregation learning for unsupervised visible-infrared person re-identification. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2843–2851, 2022a.
  51. Learning with twin noisy labels for visible-infrared person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14308–14317, 2022b.
  52. Person re-identification by contour sketch under moderate clothing change. TPAMI, 2019.
  53. Towards unified text-based person retrieval: A large-scale multi-attribute and language search benchmark. In Proceedings of the 31st ACM International Conference on Multimedia, pages 4492–4501, 2023.
  54. Channel augmented joint learning for visible-infrared recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13567–13576, 2021a.
  55. Deep learning for person re-identification: A survey and outlook. TPAMI, 2021b.
  56. Adversarial attribute-image person re-identification. arXiv preprint arXiv:1712.01493, 2017.
  57. Cocas: A large-scale clothes changing person dataset for re-identification. In CVPR, 2020.
  58. Hap: Structure-aware masked image modeling for human-centric perception. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  59. Fmcnet: Feature-level modality compensation for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7349–7358, 2022.
  60. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023.
  61. Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2153–2162, 2023.
  62. Towards a unified middle modality learning for visible-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia, pages 788–796, 2021.
  63. Relation-aware global attention for person re-identification. In CVPR, 2020.
  64. Person re-identification meets image search. arXiv preprint arXiv:1502.02171, 2015a.
  65. Scalable person re-identification: A benchmark. In CVPR, 2015b.
  66. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984, 2016.
  67. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV, 2017.
  68. Dual-path convolutional image-text embeddings with instance loss. TOMM, 2020.
  69. Pass: Part-aware self-supervised pre-training for person re-identification. In European Conference on Computer Vision, pages 198–214. Springer, 2022.
  70. Plip: Language-image pre-training for person representation learning. arXiv preprint arXiv:2305.08386, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Weizhen He (4 papers)
  2. Yiheng Deng (3 papers)
  3. Shixiang Tang (48 papers)
  4. Qihao Chen (2 papers)
  5. Qingsong Xie (16 papers)
  6. Yizhou Wang (162 papers)
  7. Lei Bai (154 papers)
  8. Feng Zhu (139 papers)
  9. Rui Zhao (241 papers)
  10. Wanli Ouyang (358 papers)
  11. Donglian Qi (12 papers)
  12. Yunfeng Yan (8 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.