Efficient Transferability Assessment for Selection of Pre-trained Detectors (2403.09432v1)
Abstract: Large-scale pre-training followed by downstream fine-tuning is an effective solution for transferring deep-learning-based models. Since finetuning all possible pre-trained models is computational costly, we aim to predict the transferability performance of these pre-trained models in a computational efficient manner. Different from previous work that seek out suitable models for downstream classification and segmentation tasks, this paper studies the efficient transferability assessment of pre-trained object detectors. To this end, we build up a detector transferability benchmark which contains a large and diverse zoo of pre-trained detectors with various architectures, source datasets and training schemes. Given this zoo, we adopt 7 target datasets from 5 diverse domains as the downstream target tasks for evaluation. Further, we propose to assess classification and regression sub-tasks simultaneously in a unified framework. Additionally, we design a complementary metric for evaluating tasks with varying objects. Experimental results demonstrate that our method outperforms other state-of-the-art approaches in assessing transferability under different target domains while efficiently reducing wall-clock time 32$\times$ and requires a mere 5.2\% memory footprint compared to brute-force fine-tuning of all pre-trained detectors.
- How stable are transferability metrics evaluations? In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIV, pages 303–321. Springer, 2022.
- Transferability metrics for selecting source model ensembles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7936–7946, 2022.
- Detreg: Unsupervised pretraining with region priors for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14605–14615, 2022.
- Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Adversarial robustness: From self-supervised pre-training to fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 699–708, 2020.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
- Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017.
- Up-detr: Unsupervised pre-training for object detection with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1601–1610, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Pactran: Pac-bayesian metrics for estimating the transferability of pretrained models to classification tasks. European Conference on Computer Vision, 2022.
- Representation similarity analysis for efficient task taxonomy & transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12387–12396, 2019.
- The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
- Statistics (international student edition). Pisani, R. Purves, 4th edn. WW Norton & Company, New York, 2007.
- Precise detection in densely packed scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5227–5236, 2019.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Stephen F. Gull. Developments in Maximum Entropy Data Analysis, pages 53–71. Springer Netherlands, Dordrecht, 1989.
- Spottune: transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4805–4814, 2019.
- Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019.
- Soda10m: A large-scale 2d self/semi-supervised object detection dataset for autonomous driving. Advances in Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
- Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
- Bayesian evidence and model selection. Digital Signal Processing, 47:50–67, 2015.
- Probabilistic graphical models: principles and techniques. MIT press, 2009.
- Do better imagenet models transfer better? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2661–2671, 2019.
- Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
- The open images dataset v4. International Journal of Computer Vision, 128(7):1956–1981, 2020.
- Ranking neural checkpoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2663–2673, 2021.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
- Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
- Factors of influence for transfer learning across diverse appearance domains and task types. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Leep: A new measure to evaluate transferability of learned representations. In International Conference on Machine Learning, pages 7294–7305. PMLR, 2020.
- A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
- Transferability estimation using bhattacharyya class separability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9172–9182, 2022.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Designing network design spaces. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10428–10436, 2020.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 658–666, 2019.
- Objects365: A large-scale, high-quality dataset for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8430–8439, 2019.
- Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123, 2018.
- Not all models are equal: Predicting model transferability in a self-challenging fisher space. European Conference on Computer Vision, 2022.
- Deep model transferability from attribution maps. Advances in Neural Information Processing Systems, 32, 2019.
- Depara: Deep attribution graph for deep knowledge transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3922–3930, 2020.
- Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14454–14463, 2021.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
- Otce: A transferability metric for cross-domain cross-task representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15779–15788, June 2021.
- Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9627–9636, 2019.
- Transferability and hardness of supervised classification tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1395–1405, 2019.
- Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 43(10):3349–3364, 2020.
- Aligning pretraining for detection via object-level contrastive learning. Advances in Neural Information Processing Systems, 34:22682–22694, 2021.
- Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, 2022.
- Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017.
- Knas: green neural architecture search. In International Conference on Machine Learning, pages 11613–11625. PMLR, 2021.
- Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. Journal of medical imaging, 5(3):036501, 2018.
- Instance localization for self-supervised detection pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3987–3996, 2021.
- Wider face: A face detection benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5525–5533, 2016.
- How transferable are features in deep neural networks? Advances in neural information processing systems, 27, 2014.
- Logme: Practical assessment of pre-trained models for transfer learning. In International Conference on Machine Learning, pages 12133–12143. PMLR, 2021.
- Ranking and tuning pre-trained models: A new paradigm for exploiting model hubs. Journal of Machine Learning Research, 23:1–47, 2022.
- Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018.
- Dynamic r-cnn: Towards high quality object detection via dynamic training. In European conference on computer vision, pages 260–275. Springer, 2020.
- Distance-iou loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12993–13000, 2020.
- Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7380–7399, 2021.
- Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
- Zhao Wang (155 papers)
- Aoxue Li (22 papers)
- Zhenguo Li (195 papers)
- Qi Dou (163 papers)