Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding the Transferability of Representations via Task-Relatedness (2307.00823v2)

Published 3 Jul 2023 in cs.LG

Abstract: The growing popularity of transfer learning, due to the availability of models pre-trained on vast amounts of data, makes it imperative to understand when the knowledge of these pre-trained models can be transferred to obtain high-performing models on downstream target tasks. However, the exact conditions under which transfer learning succeeds in a cross-domain cross-task setting are still poorly understood. To bridge this gap, we propose a novel analysis that analyzes the transferability of the representations of pre-trained models to downstream tasks in terms of their relatedness to a given reference task. Our analysis leads to an upper bound on transferability in terms of task-relatedness, quantified using the difference between the class priors, label sets, and features of the two tasks. Our experiments using state-of-the-art pre-trained models show the effectiveness of task-relatedness in explaining transferability on various vision and language tasks. The efficient computability of task-relatedness even without labels of the target task and its high correlation with the model's accuracy after end-to-end fine-tuning on the target task makes it a useful metric for transferability estimation. Our empirical results of using task-relatedness to select the best pre-trained model from a model zoo for a target task highlight its utility for practical problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Geometric dataset distances via optimal transport. arXiv preprint arXiv:2002.02923, 2020.
  2. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
  3. An information-theoretic approach to transferability in task transfer learning. In 2019 IEEE International Conference on Image Processing (ICIP), pages 2309–2313, 2019.
  4. A theory of learning from different domains. Machine learning, 79(1):151–175, 2010.
  5. Analysis of representations for domain adaptation. Advances in neural information processing systems, 19:137, 2007.
  6. Exploiting task relatedness for multiple task learning. In Learning theory and kernel machines, pages 567–580. Springer, 2003.
  7. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
  8. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  9. An empirical study of training self-supervised vision transformers, 2021.
  10. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
  11. Deep learning for classical japanese literature. arXiv preprint arXiv:1812.01718, 2018.
  12. Joint distribution optimal transportation for domain adaptation. Advances in neural information processing systems, 30, 2017.
  13. R-fcn: Object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29, 2016.
  14. Deepjdot: Deep joint distribution optimal transport for unsupervised domain adaptation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 447–463, 2018.
  15. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  17. An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
  18. Pot: Python optimal transport. The Journal of Machine Learning Research, 22(1):3571–3578, 2021.
  19. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  20. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  21. Deep residual learning for image recognition, 2015.
  22. Frustratingly easy transferability estimation, 2022.
  23. Learning multiple layers of features from tiny images. ., 2009.
  24. Certifying model accuracy under distribution shifts. arXiv preprint arXiv:2201.12440, 2022.
  25. Lamda: Label matching deep domain adaptation. In International Conference on Machine Learning, pages 6043–6054. PMLR, 2021.
  26. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  27. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  28. Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430, 2009.
  29. Understanding the limits of unsupervised domain adaptation via data poisoning. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
  30. Do domain generalization methods generalize well? In NeurIPS ML Safety Workshop, 2022.
  31. Leep: A new measure to evaluate transferability of learned representations, 2020.
  32. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
  33. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pages 3498–3505. IEEE, 2012.
  34. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  35. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  36. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  37. Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
  38. Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems, 33:3533–3545, 2020.
  39. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020.
  40. Robust learning meets generative models: Can proxy distributions improve adversarial robustness? arXiv preprint arXiv:2104.09425, 2021.
  41. Wasserstein distance guided representation learning for domain adaptation. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  42. Certifying some distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571, 2017.
  43. Otce: A transferability metric for cross-domain cross-task representations, 2021.
  44. Transferability and hardness of supervised classification tasks, 2019.
  45. Cédric Villani. Optimal transport: old and new, volume 338. Springer, 2009.
  46. A survey of transfer learning. Journal of Big Data, 3(1):9, May 2016.
  47. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  48. Logme: Practical assessment of pre-trained models for transfer learning, 2021.
  49. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Akshay Mehra (15 papers)
  2. Yunbei Zhang (10 papers)
  3. Jihun Hamm (28 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com