Function-Space Regularization in Neural Networks: A Probabilistic Perspective (2312.17162v1)
Abstract: Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by viewing parameter-space regularization as specifying an empirical prior distribution over the model parameters, we can derive a probabilistically well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training. This method -- which we refer to as function-space empirical Bayes (FSEB) -- includes both parameter- and function-space regularization, is mathematically simple, easy to implement, and incurs only minimal computational overhead compared to standard regularization techniques. We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection, highly-calibrated predictive uncertainty estimates, successful task adaption from pre-trained models, and improved generalization under covariate shift.
- Asia Pacific Tele-Ophthalmology Society. Aptos 2019 blindness detection, 2019. URL https://www.kaggle.com/competitions/aptos2019-blindness-detection/overview.
- Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks. In Advances in Neural Information Processing Systems 34, 2021.
- Measuring and regularizing networks in function space. ArXiv, abs/1805.08289, 2018.
- Group invariance, stability to deformations, and complexity of deep convolutional representations, 2018.
- A kernel perspective for regularizing deep neural networks. In International Conference on Machine Learning, pp. 664–674. PMLR, 2019.
- Bishop, C. M. Pattern recognition and machine learning (information science and statistics). 2006.
- Breiman, L. Random forests. Machine Learning, 45:5–32, 2001.
- Language models are few-shot learners. ArXiv, abs/2005.14165, 2020.
- Understanding variational inference in function-space. ArXiv, abs/2011.09421, 2020.
- A neural tangent kernel perspective on function-space regularization in neural networks. In OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop), 2022.
- Deep learning for classical japanese literature. ArXiv, abs/1812.01718, 2018.
- Transforming neural-net output levels to probability distributions. In NIPS, 1990.
- On the foundations of noise-free selective classification. Journal of Machine Learning Research, 11(53):1605–1641, 2010.
- Does progress on imagenet transfer to real-world datasets? ArXiv, abs/2301.04644, 2023.
- Identifying melanoma images using efficientnet ensemble: Winning solution to the SIIM-ISIC melanoma classification challenge. CoRR, abs/2010.05351, 2020.
- Hanke, J. 1st place solution, 2021. URL https://www.kaggle.com/competitions/cassava-leaf-disease-classification/discussion/221957.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, 2016.
- Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2019.
- Revisiting explicit regularization in neural networks for well-calibrated predictive uncertainty, 2020.
- Krizhevsky, A. Convolutional deep belief networks on cifar-10. 2010.
- A simple weight decay can improve generalization. In NIPS, 1991.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 6402–6413, 2017.
- Functional variational inference based on stochastic process generators. In NeurIPS, 2021.
- Variational implicit processes. In International Conference on Machine Learning, 2018.
- Murphy, K. P. Machine learning : a probabilistic perspective. MIT Press, Cambridge, Mass. [u.a.], 2013. ISBN 9780262018029 0262018020.
- icassava 2019fine-grained visual categorization challenge, 2019.
- Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep Learning. 2021.
- Obtaining well calibrated probabilities using bayesian binning. Proceedings of the … AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, 2015:2901–2907, 2015.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Global inducing point variational posteriors for bayesian neural networks and deep gaussian processes. In International Conference on Machine Learning, 2020.
- Pan, I. [2nd place] solution overview, 2020. URL https://www.kaggle.com/competitions/siim-isic-melanoma-classification/discussion/175324.
- Selective classification via neural network training dynamics, 2022.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021.
- Do cifar-10 classifiers generalize to cifar-10? 2018.
- Tractable function-space variational inference in Bayesian neural networks. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022a.
- Continual Learning via Sequential Function-Space Variational Inference. In Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR, 2022b.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211–252, 2014.
- SIIM and ISIC. Siim-isic melanoma classification, 2020. URL https://www.kaggle.com/competitions/siim-isic-melanoma-classification/overview.
- Functional variational bayesian neural networks. ArXiv, abs/1903.05779, 2019a.
- Functional variational Bayesian neural networks. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019b.
- Functional regularisation for continual learning using gaussian processes. ArXiv, abs/1901.11356, 2019.
- Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023.
- Plex: Towards Reliability Using Pretrained Large Model Extensions. In ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward, 2022.
- Function space particle optimization for bayesian neural networks. ArXiv, abs/1902.09754, 2019.
- Bayesian deep learning and a probabilistic perspective of generalization. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
- Wolpert, D. H. Bayesian backpropagation over i-o functions rather than weights. In Cowan, J., Tesauro, G., and Alspector, J. (eds.), Advances in Neural Information Processing Systems, volume 6. Morgan-Kaufmann, 1993.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. 2017.
- Xu, G. 1st place solution summary, 2019. URL https://www.kaggle.com/competitions/aptos2019-blindness-detection/discussion/108065.