Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Feature Protection For Out-of-distribution Generalization (2405.16027v1)

Published 25 May 2024 in cs.LG

Abstract: With the availability of large pre-trained models, a modern workflow for building real-world machine learning solutions is to fine-tune such models on a downstream task with a relatively small domain-specific dataset. In such applications, one major challenge is that the small fine-tuning dataset does not have sufficient coverage of the distribution encountered when the model is deployed. It is thus important to design fine-tuning methods that are robust to out-of-distribution (OOD) data that are under-represented by the training data. This paper compares common fine-tuning methods to investigate their OOD performance and demonstrates that standard methods will result in a significant change to the pre-trained model so that the fine-tuned features overfit the fine-tuning dataset. However, this causes deteriorated OOD performance. To overcome this issue, we show that protecting pre-trained features leads to a fine-tuned model more robust to OOD generalization. We validate the feature protection methods with extensive experiments of fine-tuning CLIP on ImageNet and DomainNet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. The evolution of out-of-distribution robustness throughout fine-tuning. arXiv preprint arXiv:2106.15831, 2021.
  2. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
  3. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
  4. Probing representation forgetting in supervised and unsupervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16712–16721, 2022.
  5. Finetune like you pretrain: Improved finetuning of zero-shot vision models. ArXiv, abs/2212.00638, 2022. URL https://api.semanticscholar.org/CorpusID:254125206.
  6. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  8340–8349, 2021a.
  7. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15262–15271, 2021b.
  8. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021a.
  9. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685, 2021b. URL https://api.semanticscholar.org/CorpusID:235458009.
  10. Continual learning for text classification with information disentanglement based regularization. arXiv preprint arXiv:2104.05489, 2021.
  11. Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
  12. Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054, 2022.
  13. Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. ArXiv, abs/2301.06267, 2023. URL https://api.semanticscholar.org/CorpusID:255942320.
  14. Task-specific skill localization in fine-tuned language models. arXiv preprint arXiv:2302.06600, 2023.
  15. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  1406–1415, 2019.
  16. Simple and fast group robustness by automatic feature reweighting. arXiv preprint arXiv:2306.11074, 2023.
  17. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  18. Do imagenet classifiers generalize to imagenet? In International conference on machine learning, pp.  5389–5400. PMLR, 2019.
  19. Domain-adjusted regression or: Erm may already learn features sufficient for out-of-distribution generalization. arXiv preprint arXiv:2202.06856, 2022.
  20. Trainable projected gradient method for robust fine-tuning. ArXiv, abs/2303.10720, 2023. URL https://api.semanticscholar.org/CorpusID:257631710.
  21. Learning robust global representations by penalizing local predictive power. Advances in Neural Information Processing Systems, 32, 2019.
  22. Robust fine-tuning of zero-shot models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  7949–7961, 2021. URL https://api.semanticscholar.org/CorpusID:237420687.
  23. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7959–7971, 2022.
  24. Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, pp.  2825–2834. PMLR, 2018.
  25. On model selection consistency of lasso. The Journal of Machine Learning Research, 7:2541–2563, 2006.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Lu Tan (7 papers)
  2. Huei Zhou (1 paper)
  3. Yinxiang Huang (1 paper)
  4. Zeming Zheng (1 paper)
  5. Yujiu Yang (155 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets