AutoFT: Learning an Objective for Robust Fine-Tuning (2401.10220v2)
Abstract: Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning. However, fine-tuning a model on one data distribution often degrades performance under distribution shifts. Current approaches to robust fine-tuning use hand-crafted regularization techniques to constrain the fine-tuning process towards the pretrained model. Yet, it is hard to specify how to adapt relevant characteristics of the foundation model during fine-tuning, as this depends on how the pre-training, fine-tuning, and test data distributions relate to each other. We propose AutoFT, a data-driven approach for robust fine-tuning. Given a task, AutoFT searches for a fine-tuning procedure that enhances out-of-distribution (OOD) generalization. Specifically, AutoFT uses bi-level optimization to search for an objective function and hyperparameters that maximize post-adaptation performance on a small OOD validation set. We evaluate AutoFT on nine natural distribution shifts. Our experiments show that AutoFT significantly improves generalization to OOD inputs, outperforming existing robust fine-tuning methods. Notably, AutoFT achieves a new state-of-the-art on the WILDS iWildCam and FMoW benchmarks, outperforming the previous best methods by $6.0\%$ and $1.5\%$, respectively.
- Better fine-tuning by reducing representational collapse. arXiv preprint arXiv:2008.03156, 2020.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019.
- The evolution of out-of-distribution robustness throughout fine-tuning. arXiv preprint arXiv:2106.15831, 2021.
- Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29, 2016.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Agreement-on-the-line: Predicting the performance of neural networks under distribution shift. Advances in Neural Information Processing Systems, 35:19274–19289, 2022.
- Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
- Meta learning via learned loss. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4161–4168. IEEE, 2021.
- The iwildcam 2021 competition dataset. arXiv preprint arXiv:2105.03494, 2021.
- On the optimization of a synaptic learning rule. In Optimality in Biological and Artificial Networks?, pages 281–303. Routledge, 2013.
- Random search for hyper-parameter optimization. Journal of machine learning research, 13(2), 2012.
- Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2011.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- What is the effect of importance weighting in deep learning? In International Conference on Machine Learning, pages 872–881. PMLR, 2019.
- Symbolic discovery of optimization algorithms. arXiv preprint arXiv:2302.06675, 2023.
- Functional map of the world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6172–6180, 2018.
- " this is my unicorn, fluffy": Personalizing frozen vision-language representations. arXiv preprint arXiv:2204.01694, 2022.
- Environment inference for invariant learning. In International Conference on Machine Learning, pages 2189–2200. PMLR, 2021.
- Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 113–123, 2019.
- Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Source-free adaptation to measurement shift via bottom-up feature restoration. arXiv preprint arXiv:2107.05446, 2021.
- Unit-level surprise in neural networks. In I (Still) Can’t Believe It’s Not Better! Workshop at NeurIPS 2021, pages 33–40. PMLR, 2022.
- Head2toe: Utilizing intermediate representations for better transfer learning. In International Conference on Machine Learning, pages 6009–6033. PMLR, 2022.
- Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In Computer Vision and Pattern Recognition Workshop. IEEE, 2004.
- Efficient and robust automated machine learning. Advances in neural information processing systems, 28, 2015.
- Out-of-domain robustness via targeted augmentations. arXiv preprint arXiv:2302.11861, 2023.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
- Distance-based regularisation of deep networks for fine-tuning. In International Conference on Learning Representations, 2021.
- Finetune like you pretrain: Improved finetuning of zero-shot vision models. arXiv preprint arXiv:2212.00638, 2022a.
- Test time adaptation via conjugate pseudo-labels. Advances in Neural Information Processing Systems, 35:6204–6218, 2022b.
- In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020.
- Spottune: transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4805–4814, 2019.
- Faster autoaugment: Learning augmentation strategies using backpropagation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 1–16. Springer, 2020.
- Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations, 2019.
- Using self-supervised learning can improve model robustness and uncertainty. Advances in neural information processing systems, 32, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021a.
- Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021b.
- Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy, January 17-21, 2011. Selected Papers 5, pages 507–523. Springer, 2011.
- Openclip, 2021. If you use this software, please cite it as below.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
- Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv preprint arXiv:1911.03437, 2019.
- Test-time adaptable neural networks for robust medical image segmentation. Medical Image Analysis, 68:101907, 2021.
- Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Improving generalization in meta reinforcement learning using learned objectives. arXiv preprint arXiv:1910.04098, 2019.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 554–561, 2013.
- Learning multiple layers of features from tiny images. 2009.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations, 2022a.
- How to fine-tune vision models with sgd, 2022b.
- Mixout: Effective regularization to finetune large-scale pretrained language models. arXiv preprint arXiv:1909.11299, 2019a.
- What would elsa do? freezing layers during transformer fine-tuning. arXiv preprint arXiv:1911.03090, 2019b.
- Surgical fine-tuning improves adaptation to distribution shifts. arXiv preprint arXiv:2210.11466, 2022a.
- Diversify and disambiguate: Learning from underspecified data. arXiv preprint arXiv:2202.03418, 2022b.
- Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI conference on artificial intelligence, 2018a.
- Rethinking the hyperparameters for fine-tuning. In International Conference on Learning Representations, 2020.
- Hyperband: A novel bandit-based approach to hyperparameter optimization. The journal of machine learning research, 18(1):6765–6816, 2017.
- Massively parallel hyperparameter tuning. 2018b.
- Fast autoaugment. Advances in Neural Information Processing Systems, 32, 2019.
- Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pages 6781–6792. PMLR, 2021a.
- Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
- Autofreeze: Automatically freezing model blocks to accelerate fine-tuning. arXiv preprint arXiv:2102.01386, 2021b.
- Harder or different? a closer look at distribution shift in dataset reproduction. In ICML Workshop on Uncertainty and Robustness in Deep Learning, page 15, 2020.
- Velo: Training versatile learned optimizers by scaling up. arXiv preprint arXiv:2211.09760, 2022.
- Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In International Conference on Machine Learning, pages 7721–7735. PMLR, 2021.
- Fine-tuning can cripple your foundation model; preserving features may be the solution. arXiv preprint arXiv:2308.13320, 2023.
- Discovering reinforcement learning algorithms. Advances in Neural Information Processing Systems, 33:1060–1070, 2020.
- Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1717–1724, 2014.
- Attentional biased stochastic gradient for imbalanced classification. arXiv preprint arXiv:2012.06951, 2020.
- Learning transferable visual models from natural language supervision, 2021a.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021b.
- Anatomy of catastrophic forgetting: Hidden representations and task semantics. arXiv preprint arXiv:2007.07400, 2020.
- Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, pages 4780–4789, 2019.
- Do cifar-10 classifiers generalize to cifar-10? 2018.
- Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, pages 5389–5400. PMLR, 2019.
- A flexible selection scheme for minimum-effort transfer learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2191–2200, 2020.
- Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
- Extending the WILDS benchmark for unsupervised adaptation. In International Conference on Learning Representations, 2022.
- Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems, 33:3533–3545, 2020.
- Cnn features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 806–813, 2014.
- Partial is better than all: Revisiting fine-tuning strategy for few-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 9594–9602, 2021.
- Measuring robustness to natural distribution shifts in image classification. Advances in Neural Information Processing Systems, 33:18583–18599, 2020.
- 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958–1970, 2008.
- Three things everyone should know about vision transformers. arXiv preprint arXiv:2203.09795, 2022.
- Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
- Rotation equivariant cnns for digital pathology. arXiv preprint arXiv:1806.03962, 2018.
- Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems, pages 10506–10518, 2019.
- Learned optimizers that scale and generalize. 2017.
- A fine-grained analysis on distribution shift. arXiv preprint arXiv:2110.11328, 2021.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR, 2022a.
- Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, 2022b.
- Pc-darts: Partial channel connections for memory-efficient architecture search. arXiv preprint arXiv:1907.05737, 2019.
- Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, pages 2825–2834. PMLR, 2018.
- Improving out-of-distribution robustness via selective augmentation. In International Conference on Machine Learning, pages 25407–25437. PMLR, 2022.
- How transferable are features in deep neural networks? Advances in neural information processing systems, 27, 2014.
- One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557, 2018.
- Side-tuning: a baseline for network adaptation via additive side networks. In European Conference on Computer Vision, pages 698–714. Springer, 2020.
- Adaptive risk minimization: Learning to adapt to domain shift. Advances in Neural Information Processing Systems, 34:23664–23678, 2021.
- Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
- Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.