Understanding the Transferability of Representations via Task-Relatedness (2307.00823v2)
Abstract: The growing popularity of transfer learning, due to the availability of models pre-trained on vast amounts of data, makes it imperative to understand when the knowledge of these pre-trained models can be transferred to obtain high-performing models on downstream target tasks. However, the exact conditions under which transfer learning succeeds in a cross-domain cross-task setting are still poorly understood. To bridge this gap, we propose a novel analysis that analyzes the transferability of the representations of pre-trained models to downstream tasks in terms of their relatedness to a given reference task. Our analysis leads to an upper bound on transferability in terms of task-relatedness, quantified using the difference between the class priors, label sets, and features of the two tasks. Our experiments using state-of-the-art pre-trained models show the effectiveness of task-relatedness in explaining transferability on various vision and language tasks. The efficient computability of task-relatedness even without labels of the target task and its high correlation with the model's accuracy after end-to-end fine-tuning on the target task makes it a useful metric for transferability estimation. Our empirical results of using task-relatedness to select the best pre-trained model from a model zoo for a target task highlight its utility for practical problems.
- Geometric dataset distances via optimal transport. arXiv preprint arXiv:2002.02923, 2020.
- Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
- An information-theoretic approach to transferability in task transfer learning. In 2019 IEEE International Conference on Image Processing (ICIP), pages 2309–2313, 2019.
- A theory of learning from different domains. Machine learning, 79(1):151–175, 2010.
- Analysis of representations for domain adaptation. Advances in neural information processing systems, 19:137, 2007.
- Exploiting task relatedness for multiple task learning. In Learning theory and kernel machines, pages 567–580. Springer, 2003.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- An empirical study of training self-supervised vision transformers, 2021.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
- Deep learning for classical japanese literature. arXiv preprint arXiv:1812.01718, 2018.
- Joint distribution optimal transportation for domain adaptation. Advances in neural information processing systems, 30, 2017.
- R-fcn: Object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29, 2016.
- Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 447–463, 2018.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
- Pot: Python optimal transport. The Journal of Machine Learning Research, 22(1):3571–3578, 2021.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Deep residual learning for image recognition, 2015.
- Frustratingly easy transferability estimation, 2022.
- Learning multiple layers of features from tiny images. ., 2009.
- Certifying model accuracy under distribution shifts. arXiv preprint arXiv:2201.12440, 2022.
- Lamda: Label matching deep domain adaptation. In International Conference on Machine Learning, pages 6043–6054. PMLR, 2021.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
- Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430, 2009.
- Understanding the limits of unsupervised domain adaptation via data poisoning. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
- Do domain generalization methods generalize well? In NeurIPS ML Safety Workshop, 2022.
- Leep: A new measure to evaluate transferability of learned representations, 2020.
- A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
- Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
- Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
- Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems, 33:3533–3545, 2020.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020.
- Robust learning meets generative models: Can proxy distributions improve adversarial robustness? arXiv preprint arXiv:2104.09425, 2021.
- Wasserstein distance guided representation learning for domain adaptation. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- Certifying some distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571, 2017.
- Otce: A transferability metric for cross-domain cross-task representations, 2021.
- Transferability and hardness of supervised classification tasks, 2019.
- Cédric Villani. Optimal transport: old and new, volume 338. Springer, 2009.
- A survey of transfer learning. Journal of Big Data, 3(1):9, May 2016.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Logme: Practical assessment of pre-trained models for transfer learning, 2021.
- Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
- Akshay Mehra (15 papers)
- Yunbei Zhang (10 papers)
- Jihun Hamm (28 papers)