Divide and Conquer: Hybrid Pre-training for Person Search (2312.07970v1)
Abstract: Large-scale pre-training has proven to be an effective method for improving performance across different tasks. Current person search methods use ImageNet pre-trained models for feature extraction, yet it is not an optimal solution due to the gap between the pre-training task and person search task (as a downstream task). Therefore, in this paper, we focus on pre-training for person search, which involves detecting and re-identifying individuals simultaneously. Although labeled data for person search is scarce, datasets for two sub-tasks person detection and re-identification are relatively abundant. To this end, we propose a hybrid pre-training framework specifically designed for person search using sub-task data only. It consists of a hybrid learning paradigm that handles data with different kinds of supervisions, and an intra-task alignment module that alleviates domain discrepancy under limited resources. To the best of our knowledge, this is the first work that investigates how to support full-task pre-training using sub-task data. Extensive experiments demonstrate that our pre-trained model can achieve significant improvements across diverse protocols, such as person search method, fine-tuning data, pre-training data and model backbone. For example, our model improves ResNet50 based NAE by 10.3% relative improvement w.r.t. mAP. Our code and pre-trained models are released for plug-and-play usage to the person search community.
- DETReg: Unsupervised Pretraining with Region Priors for Object Detection. In Computer Vision and Pattern Recognition, 14605–14615.
- EuroCity Persons: A Novel Benchmark for Person Detection in Traffic Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8): 1844–1861.
- PSTR: End-to-End One-Step Person Search With Transformers. In Computer Vision and Pattern Recognition, 9458–9467.
- End-to-end object detection with transformers. In European conference on computer vision, 213–229.
- Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Computer Vision and Pattern Recognition, 6299–6308.
- RCAA: Relational context-aware agents for person search. In European conference on computer vision, 84–100.
- Hierarchical online instance matching for person search. In AAAI Conference on Artificial Intelligence, volume 34, 10518–10525.
- Person search by separated modeling and a mask-guided two-stream CNN model. IEEE Transactions on Image Processing, 29: 4669–4682.
- Norm-aware embedding for efficient person search. In Computer Vision and Pattern Recognition, 12615–12624.
- Norm-Aware Embedding for Efficient Person Search and Tracking. International Journal of Computer Vision, 129(11): 3154–3168.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, 1597–1607.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
- Exploring simple siamese representation learning. In Computer Vision and Pattern Recognition, 15750–15758.
- Domain adaptive Faster R-CNN for object detection in the wild. In Computer Vision and Pattern Recognition, 3339–3348.
- ImageNet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 248–255.
- PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking. In Computer Vision and Pattern Recognition, 20963–20972.
- Bi-directional interaction network for person search. In Computer Vision and Pattern Recognition, 2839–2848.
- Instance guided proposal network for person search. In Computer Vision and Pattern Recognition, 2585–2594.
- Unsupervised Pre-training for Person Re-identification. In Computer Vision and Pattern Recognition, 14750–14759.
- Large-Scale Pre-training for Person Re-identification with Noisy Labels. In Computer Vision and Pattern Recognition, 2476–2486.
- Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning, 1180–1189.
- End-to-end trainable trident person search network using adaptive gradient propagation. In International Conference on Computer Vision, 925–933.
- Weakly supervised person search with region siamese networks. In International Conference on Computer Vision, 12006–12015.
- Re-id driven localization refinement for person search. In International Conference on Computer Vision, 9814–9823.
- Rethinking imagenet pre-training. In Computer Vision and Pattern Recognition, 4918–4927.
- Deep residual learning for image recognition. In Computer Vision and Pattern Recognition, 770–778.
- Masked Autoencoders Are Scalable Vision Learners. In Computer Vision and Pattern Recognition, 16000–16009.
- Prototype-guided saliency feature learning for person search. In Computer Vision and Pattern Recognition, 4865–4874.
- Person search by multi-scale matching. In European conference on computer vision, 536–552.
- Person search by multi-scale matching. In International Conference on Computer Vision, 536–552.
- Domain adaptive person search. In European conference on computer vision, 302–318.
- DeepReID: Deep filter pairing neural network for person re-identification. In Computer Vision and Pattern Recognition, 152–159.
- Sequential end-to-end network for efficient person search. In AAAI Conference on Artificial Intelligence, volume 35, 2011–2019.
- Neural person search machines. In International Conference on Computer Vision, 493–501.
- Query-guided end-to-end person search. In Computer Vision and Pattern Recognition, 811–820.
- Faster R-CNN: Towards real-time object detection with region proposal networks. volume 28.
- CrowdHuman: A Benchmark for Detecting Human in a Crowd. arXiv preprint arXiv:1805.00123.
- Id-Free Person Similarity Learning. In Computer Vision and Pattern Recognition, 14689–14699.
- Grouped Adaptive Loss Weighting for Person Search. In ACM International Conference on Multimedia, 6774–6782.
- TCTS: A task-consistent two-stage framework for person search. In Computer Vision and Pattern Recognition, 11952–11961.
- Masked Feature Prediction for Self-Supervised Visual Pre-Training. In Computer Vision and Pattern Recognition, 14668–14678.
- Person transfer GAN to bridge domain gap for person re-identification. In Computer Vision and Pattern Recognition, 79–88.
- Joint detection and identification feature learning for person search. In Computer Vision and Pattern Recognition, 3415–3424.
- Exploring visual context for weakly supervised person search. In AAAI Conference on Artificial Intelligence, volume 36, 3027–3035.
- Anchor-free person search. In Computer Vision and Pattern Recognition, 7690–7699.
- Efficient Person Search: An Anchor-Free Approach. International Journal of Computer Vision, 131(7): 1642–1661.
- Learning context graph for person search. In Computer Vision and Pattern Recognition, 2158–2167.
- Unleashing Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification. In Computer Vision and Pattern Recognition, 14298–14307.
- Joint person objectness and repulsion for person search. IEEE Transactions on Image Processing, 30: 685–696.
- Cascade Transformers for End-to-End Person Search. In Computer Vision and Pattern Recognition, 7267–7276.
- ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Computer Vision and Pattern Recognition, 6848–6856.
- Person re-identification in the wild. In Computer Vision and Pattern Recognition, 1367–1376.
- Yanling Tian (2 papers)
- Di Chen (60 papers)
- Yunan Liu (6 papers)
- Jian Yang (505 papers)
- Shanshan Zhang (36 papers)