Papers
Topics
Authors
Recent
2000 character limit reached

AutoFT: Learning an Objective for Robust Fine-Tuning (2401.10220v2)

Published 18 Jan 2024 in cs.CV and cs.LG

Abstract: Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning. However, fine-tuning a model on one data distribution often degrades performance under distribution shifts. Current approaches to robust fine-tuning use hand-crafted regularization techniques to constrain the fine-tuning process towards the pretrained model. Yet, it is hard to specify how to adapt relevant characteristics of the foundation model during fine-tuning, as this depends on how the pre-training, fine-tuning, and test data distributions relate to each other. We propose AutoFT, a data-driven approach for robust fine-tuning. Given a task, AutoFT searches for a fine-tuning procedure that enhances out-of-distribution (OOD) generalization. Specifically, AutoFT uses bi-level optimization to search for an objective function and hyperparameters that maximize post-adaptation performance on a small OOD validation set. We evaluate AutoFT on nine natural distribution shifts. Our experiments show that AutoFT significantly improves generalization to OOD inputs, outperforming existing robust fine-tuning methods. Notably, AutoFT achieves a new state-of-the-art on the WILDS iWildCam and FMoW benchmarks, outperforming the previous best methods by $6.0\%$ and $1.5\%$, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (101)
  1. Better fine-tuning by reducing representational collapse. arXiv preprint arXiv:2008.03156, 2020.
  2. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019.
  3. The evolution of out-of-distribution robustness throughout fine-tuning. arXiv preprint arXiv:2106.15831, 2021.
  4. Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29, 2016.
  5. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  6. Agreement-on-the-line: Predicting the performance of neural networks under distribution shift. Advances in Neural Information Processing Systems, 35:19274–19289, 2022.
  7. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
  8. Meta learning via learned loss. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4161–4168. IEEE, 2021.
  9. The iwildcam 2021 competition dataset. arXiv preprint arXiv:2105.03494, 2021.
  10. On the optimization of a synaptic learning rule. In Optimality in Biological and Artificial Networks?, pages 281–303. Routledge, 2013.
  11. Random search for hyper-parameter optimization. Journal of machine learning research, 13(2), 2012.
  12. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2011.
  13. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  14. What is the effect of importance weighting in deep learning? In International Conference on Machine Learning, pages 872–881. PMLR, 2019.
  15. Symbolic discovery of optimization algorithms. arXiv preprint arXiv:2302.06675, 2023.
  16. Functional map of the world. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6172–6180, 2018.
  17. " this is my unicorn, fluffy": Personalizing frozen vision-language representations. arXiv preprint arXiv:2204.01694, 2022.
  18. Environment inference for invariant learning. In International Conference on Machine Learning, pages 2189–2200. PMLR, 2021.
  19. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 113–123, 2019.
  20. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020.
  21. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  22. Source-free adaptation to measurement shift via bottom-up feature restoration. arXiv preprint arXiv:2107.05446, 2021.
  23. Unit-level surprise in neural networks. In I (Still) Can’t Believe It’s Not Better! Workshop at NeurIPS 2021, pages 33–40. PMLR, 2022.
  24. Head2toe: Utilizing intermediate representations for better transfer learning. In International Conference on Machine Learning, pages 6009–6033. PMLR, 2022.
  25. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In Computer Vision and Pattern Recognition Workshop. IEEE, 2004.
  26. Efficient and robust automated machine learning. Advances in neural information processing systems, 28, 2015.
  27. Out-of-domain robustness via targeted augmentations. arXiv preprint arXiv:2302.11861, 2023.
  28. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
  29. Distance-based regularisation of deep networks for fine-tuning. In International Conference on Learning Representations, 2021.
  30. Finetune like you pretrain: Improved finetuning of zero-shot vision models. arXiv preprint arXiv:2212.00638, 2022a.
  31. Test time adaptation via conjugate pseudo-labels. Advances in Neural Information Processing Systems, 35:6204–6218, 2022b.
  32. In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020.
  33. Spottune: transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4805–4814, 2019.
  34. Faster autoaugment: Learning augmentation strategies using backpropagation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 1–16. Springer, 2020.
  35. Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations, 2019.
  36. Using self-supervised learning can improve model robustness and uncertainty. Advances in neural information processing systems, 32, 2019.
  37. The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021a.
  38. Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021b.
  39. Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy, January 17-21, 2011. Selected Papers 5, pages 507–523. Springer, 2011.
  40. Openclip, 2021. If you use this software, please cite it as below.
  41. Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
  42. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv preprint arXiv:1911.03437, 2019.
  43. Test-time adaptable neural networks for robust medical image segmentation. Medical Image Analysis, 68:101907, 2021.
  44. Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
  45. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  46. Improving generalization in meta reinforcement learning using learned objectives. arXiv preprint arXiv:1910.04098, 2019.
  47. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
  48. 3d object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 554–561, 2013.
  49. Learning multiple layers of features from tiny images. 2009.
  50. Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations, 2022a.
  51. How to fine-tune vision models with sgd, 2022b.
  52. Mixout: Effective regularization to finetune large-scale pretrained language models. arXiv preprint arXiv:1909.11299, 2019a.
  53. What would elsa do? freezing layers during transformer fine-tuning. arXiv preprint arXiv:1911.03090, 2019b.
  54. Surgical fine-tuning improves adaptation to distribution shifts. arXiv preprint arXiv:2210.11466, 2022a.
  55. Diversify and disambiguate: Learning from underspecified data. arXiv preprint arXiv:2202.03418, 2022b.
  56. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI conference on artificial intelligence, 2018a.
  57. Rethinking the hyperparameters for fine-tuning. In International Conference on Learning Representations, 2020.
  58. Hyperband: A novel bandit-based approach to hyperparameter optimization. The journal of machine learning research, 18(1):6765–6816, 2017.
  59. Massively parallel hyperparameter tuning. 2018b.
  60. Fast autoaugment. Advances in Neural Information Processing Systems, 32, 2019.
  61. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pages 6781–6792. PMLR, 2021a.
  62. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
  63. Autofreeze: Automatically freezing model blocks to accelerate fine-tuning. arXiv preprint arXiv:2102.01386, 2021b.
  64. Harder or different? a closer look at distribution shift in dataset reproduction. In ICML Workshop on Uncertainty and Robustness in Deep Learning, page 15, 2020.
  65. Velo: Training versatile learned optimizers by scaling up. arXiv preprint arXiv:2211.09760, 2022.
  66. Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In International Conference on Machine Learning, pages 7721–7735. PMLR, 2021.
  67. Fine-tuning can cripple your foundation model; preserving features may be the solution. arXiv preprint arXiv:2308.13320, 2023.
  68. Discovering reinforcement learning algorithms. Advances in Neural Information Processing Systems, 33:1060–1070, 2020.
  69. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1717–1724, 2014.
  70. Attentional biased stochastic gradient for imbalanced classification. arXiv preprint arXiv:2012.06951, 2020.
  71. Learning transferable visual models from natural language supervision, 2021a.
  72. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021b.
  73. Anatomy of catastrophic forgetting: Hidden representations and task semantics. arXiv preprint arXiv:2007.07400, 2020.
  74. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, pages 4780–4789, 2019.
  75. Do cifar-10 classifiers generalize to cifar-10? 2018.
  76. Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, pages 5389–5400. PMLR, 2019.
  77. A flexible selection scheme for minimum-effort transfer learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2191–2200, 2020.
  78. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  79. Extending the WILDS benchmark for unsupervised adaptation. In International Conference on Learning Representations, 2022.
  80. Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems, 33:3533–3545, 2020.
  81. Cnn features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 806–813, 2014.
  82. Partial is better than all: Revisiting fine-tuning strategy for few-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 9594–9602, 2021.
  83. Measuring robustness to natural distribution shifts in image classification. Advances in Neural Information Processing Systems, 33:18583–18599, 2020.
  84. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11):1958–1970, 2008.
  85. Three things everyone should know about vision transformers. arXiv preprint arXiv:2203.09795, 2022.
  86. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
  87. Rotation equivariant cnns for digital pathology. arXiv preprint arXiv:1806.03962, 2018.
  88. Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems, pages 10506–10518, 2019.
  89. Learned optimizers that scale and generalize. 2017.
  90. A fine-grained analysis on distribution shift. arXiv preprint arXiv:2110.11328, 2021.
  91. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR, 2022a.
  92. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971, 2022b.
  93. Pc-darts: Partial channel connections for memory-efficient architecture search. arXiv preprint arXiv:1907.05737, 2019.
  94. Explicit inductive bias for transfer learning with convolutional networks. In International Conference on Machine Learning, pages 2825–2834. PMLR, 2018.
  95. Improving out-of-distribution robustness via selective augmentation. In International Conference on Machine Learning, pages 25407–25437. PMLR, 2022.
  96. How transferable are features in deep neural networks? Advances in neural information processing systems, 27, 2014.
  97. One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557, 2018.
  98. Side-tuning: a baseline for network adaptation via additive side networks. In European Conference on Computer Vision, pages 698–714. Springer, 2020.
  99. Adaptive risk minimization: Learning to adapt to domain shift. Advances in Neural Information Processing Systems, 34:23664–23678, 2021.
  100. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
  101. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.