Feature Protection For Out-of-distribution Generalization (2405.16027v1)
Abstract: With the availability of large pre-trained models, a modern workflow for building real-world machine learning solutions is to fine-tune such models on a downstream task with a relatively small domain-specific dataset. In such applications, one major challenge is that the small fine-tuning dataset does not have sufficient coverage of the distribution encountered when the model is deployed. It is thus important to design fine-tuning methods that are robust to out-of-distribution (OOD) data that are under-represented by the training data. This paper compares common fine-tuning methods to investigate their OOD performance and demonstrates that standard methods will result in a significant change to the pre-trained model so that the fine-tuned features overfit the fine-tuning dataset. However, this causes deteriorated OOD performance. To overcome this issue, we show that protecting pre-trained features leads to a fine-tuned model more robust to OOD generalization. We validate the feature protection methods with extensive experiments of fine-tuning CLIP on ImageNet and DomainNet.
- The evolution of out-of-distribution robustness throughout fine-tuning. arXiv preprint arXiv:2106.15831, 2021.
- Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
- Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
- Probing representation forgetting in supervised and unsupervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16712–16721, 2022.
- Finetune like you pretrain: Improved finetuning of zero-shot vision models. ArXiv, abs/2212.00638, 2022. URL https://api.semanticscholar.org/CorpusID:254125206.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8340–8349, 2021a.
- Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15262–15271, 2021b.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021a.
- Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685, 2021b. URL https://api.semanticscholar.org/CorpusID:235458009.
- Continual learning for text classification with information disentanglement based regularization. arXiv preprint arXiv:2104.05489, 2021.
- Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054, 2022.
- Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. ArXiv, abs/2301.06267, 2023. URL https://api.semanticscholar.org/CorpusID:255942320.
- Task-specific skill localization in fine-tuned language models. arXiv preprint arXiv:2302.06600, 2023.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1406–1415, 2019.
- Simple and fast group robustness by automatic feature reweighting. arXiv preprint arXiv:2306.11074, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Do imagenet classifiers generalize to imagenet? In International conference on machine learning, pp. 5389–5400. PMLR, 2019.
- Domain-adjusted regression or: Erm may already learn features sufficient for out-of-distribution generalization. arXiv preprint arXiv:2202.06856, 2022.
- Trainable projected gradient method for robust fine-tuning. ArXiv, abs/2303.10720, 2023. URL https://api.semanticscholar.org/CorpusID:257631710.
- Learning robust global representations by penalizing local predictive power. Advances in Neural Information Processing Systems, 32, 2019.
- Robust fine-tuning of zero-shot models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7949–7961, 2021. URL https://api.semanticscholar.org/CorpusID:237420687.
- Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7959–7971, 2022.
- Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, pp. 2825–2834. PMLR, 2018.
- On model selection consistency of lasso. The Journal of Machine Learning Research, 7:2541–2563, 2006.
- Lu Tan (7 papers)
- Huei Zhou (1 paper)
- Yinxiang Huang (1 paper)
- Zeming Zheng (1 paper)
- Yujiu Yang (155 papers)