Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning (2402.15017v1)

Published 22 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Foundation models have emerged as a powerful tool for many AI problems. Despite the tremendous success of foundation models, effective adaptation to new tasks, particularly those with limited labels, remains an open question and lacks theoretical understanding. An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples. In this paper, we study the theoretical justification of this multitask finetuning approach. Our theoretical analysis reveals that with a diverse set of related tasks, this multitask finetuning leads to reduced error in the target task, in comparison to directly adapting the same pretrained model. We quantify the relationship between finetuning tasks and target tasks by diversity and consistency metrics, and further propose a practical task selection algorithm. We substantiate our theoretical claims with extensive empirical evidence. Further, we present results affirming our task selection algorithm adeptly chooses related finetuning tasks, providing advantages to the model performance on target tasks. We believe our study shed new light on the effective adaptation of foundation models to new tasks that lack abundant labels. Our code is available at https://github.com/OliverXUZY/Foudation-Model_Multitask.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (104)
  1. A theoretical analysis of contrastive unsupervised representation learning. In 36th International Conference on Machine Learning, ICML 2019. International Machine Learning Society (IMLS), 2019.
  2. Latent dirichlet allocation. Journal of Machine Learning research, 2003.
  3. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  4. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2015.
  5. Language models are few-shot learners. Advances in neural information processing systems, 2020.
  6. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 2020.
  7. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021a.
  8. Meta-learning via language model in-context tuning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022.
  9. Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021b.
  10. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  11. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  12. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  14. Few-shot learning via learning the representation, provably. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
  15. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 2017.
  16. Generalization bounds for transfer learning with pretrained classifiers. arXiv preprint arXiv:2212.12532, 2022.
  17. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021a.
  18. SimCSE: Simple contrastive learning of sentence embeddings. In Empirical Methods in Natural Language Processing (EMNLP), 2021b.
  19. Functional regularization for representation learning: A unified theoretical perspective. Advances in Neural Information Processing Systems, 2020.
  20. Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
  21. Finetune like you pretrain: Improved finetuning of zero-shot vision models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  22. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 2020.
  23. Provable guarantees for self-supervised deep learning with spectral contrastive loss. Advances in Neural Information Processing Systems, 2021.
  24. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  25. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020.
  26. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  27. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  28. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022a.
  29. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004.
  30. Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022b.
  31. Towards the generalization of contrastive self-supervised learning. In The Eleventh International Conference on Learning Representations, 2023.
  32. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning. PMLR, 2021.
  33. Fine-tuning can distort pretrained features and underperform out-of-distribution. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  34. Human-level concept learning through probabilistic program induction. Science, 2015.
  35. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021.
  36. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2021.
  37. Learning a few-shot embedding model with contrastive learning. In Proceedings of the AAAI conference on artificial intelligence, 2021.
  38. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  39. An efficient framework for learning sentence representations. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
  40. Andreas Maurer. A vector-contraction inequality for rademacher complexities. In International Conference on Algorithmic Learning Theory. Springer, 2016.
  41. MetaICL: Learning to learn in context. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022a.
  42. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2022b.
  43. Foundations of machine learning. MIT press, 2018.
  44. Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
  45. Dreca: A general task augmentation strategy for few-shot natural language inference. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
  46. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. In Findings of the Association for Computational Linguistics: ACL 2022, 2022.
  47. B. Olshausen and D. Field. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 1997.
  48. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  49. OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt, 2022. Accessed: 2023-09-10.
  50. OpenAI. GPT-4 technical report. arXiv preprint arxiv:2303.08774, 2023.
  51. Dinov2: Learning robust visual features without supervision. arXiv:2304.07193, 2023.
  52. Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004.
  53. Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), 2005.
  54. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, 2019.
  55. True few-shot learning with language models. Advances in neural information processing systems, 2021.
  56. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 2021.
  57. Rapid learning or feature reuse? towards understanding the effectiveness of maml. In International Conference on Learning Representations, 2020.
  58. Meta-learning for semi-supervised few-shot classification. In International Conference on Learning Representations, 2018.
  59. Geometry-aware adaptation for pretrained models. arXiv preprint arXiv:2307.12226, 2023.
  60. Imagenet large scale visual recognition challenge. International journal of computer vision, 2015.
  61. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations, 2022.
  62. A theoretical analysis on feature learning in neural networks: Emergence from inputs and advantage over fixed features. In International Conference on Learning Representations, 2022.
  63. The trade-off between universality and label efficiency of representations from contrastive learning. In International Conference on Learning Representations, 2023a.
  64. Domain generalization via nuclear norm regularization. In Conference on Parsimony and Learning (Proceedings Track), 2023b.
  65. Provable guarantees for neural networks via gradient feature learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023c.
  66. Why larger language models do in-context learning differently? In R0-FoMo:Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023d.
  67. Prototypical networks for few-shot learning. Advances in neural information processing systems, 2017.
  68. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013.
  69. Clip models are few-shot learners: Empirical studies on vqa and visual entailment. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022.
  70. A graph-theoretic framework for understanding open-world semi-supervised learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
  71. When and how does known class help discover unknown ones? Provable understanding through spectral analysis. In Proceedings of the 40th International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR, 2023b.
  72. Contrastive multiview coding. In European Conference on Computer Vision. Springer, 2020a.
  73. Rethinking few-shot image classification: a good embedding is all you need? In European Conference on Computer Vision. Springer, 2020b.
  74. Contrastive learning, multi-view redundancy, and linear models. In Algorithmic Learning Theory. PMLR, 2021.
  75. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  76. Meta-dataset: A dataset of datasets for learning to learn from few examples. In International Conference on Learning Representations, 2020.
  77. On the theory of transfer learning: The importance of task diversity. Advances in Neural Information Processing Systems, 2020.
  78. Provable meta-learning of linear representations. In International Conference on Machine Learning. PMLR, 2021.
  79. Sparse coding and decorrelation in primary visual cortex during natural vision. Science, 2000.
  80. Matching networks for one shot learning. Advances in neural information processing systems, 2016.
  81. Building a question answering test collection. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 2000.
  82. Strata: Self-training with task augmentation for better few-shot learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021.
  83. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, 2018.
  84. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 2020.
  85. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys, 2020.
  86. Chaos is a ladder: A new theoretical understanding of contrastive learning via augmentation overlap. In International Conference on Learning Representations, 2022.
  87. Multitask prompt tuning enables parameter-efficient transfer learning. In The Eleventh International Conference on Learning Representations, 2023.
  88. Theoretical analysis of self-training with deep networks on unlabeled data. In International Conference on Learning Representations, 2021.
  89. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
  90. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 2022b.
  91. Toward understanding the feature learning process of self-supervised contrastive learning. In International Conference on Machine Learning. PMLR, 2021.
  92. Annotating expressions of opinions and emotions in language. Language resources and evaluation, 2005.
  93. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  94. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022.
  95. Data selection for language models via importance resampling. arXiv preprint arXiv:2302.03169, 2023.
  96. Improving foundation models for few-shot learning via multitask finetuning. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
  97. Few-shot classification with contrastive learning. In European Conference on Computer Vision. Springer, 2022.
  98. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023.
  99. Revisiting few-sample BERT fine-tuning. In International Conference on Learning Representations, 2020.
  100. Blessing of class diversity in pre-training. In International Conference on Artificial Intelligence and Statistics. PMLR, 2023.
  101. Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 2021.
  102. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022a.
  103. Learning to prompt for vision-language models. International Journal of Computer Vision, 2022b.
  104. Contrastive learning inverts the data generating process. In International Conference on Machine Learning. PMLR, 2021.
Citations (11)

Summary

We haven't generated a summary for this paper yet.