Variational Bayesian Last Layers (2404.11599v1)
Abstract: We introduce a deterministic variational formulation for training Bayesian last layer neural networks. This yields a sampling-free, single-pass model and loss that effectively improves uncertainty estimation. Our variational Bayesian last layer (VBLL) can be trained and evaluated with only quadratic complexity in last layer width, and is thus (nearly) computationally free to add to standard architectures. We experimentally investigate VBLLs, and show that they improve predictive accuracy, calibration, and out of distribution detection over baselines across both regression and classification. Finally, we investigate combining VBLL layers with variational Bayesian feature learning, yielding a lower variance collapsed variational inference method for Bayesian neural networks.
- Laurence Aitchison. A statistical theory of cold posteriors in deep neural networks. arXiv preprint arXiv:2008.05912, 2020.
- Efficient exploration through Bayesian deep q-networks. arXiv:1802.04412, 2018.
- David Blackwell. Conditional expectation and unbiased sequential estimation. The Annals of Mathematical Statistics, 1947.
- A correlated topic model of science. The annals of applied statistics, 2007.
- Weight uncertainty in neural network. In International Conference on Machine Learning (ICML), 2015.
- Bayesian inference in statistical analysis, volume 40. John Wiley & Sons, 2011.
- Correlated input-dependent label noise in large-scale image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Laplace redux-effortless Bayesian deep learning. Neural Information Processing Systems (NeurIPS), 2021a.
- Bayesian deep learning via subnetwork inference. In International Conference on Machine Learning (ICML), pp. 2510–2521. PMLR, 2021b.
- A comparison of variational approximations for fast inference in mixed logit models. Computational Statistics, 2017.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Efficient and scalable bayesian neural nets with rank-1 factors. In International Conference on Machine Learning (ICML), 2020.
- Radial bayesian neural networks: Beyond discrete support in large-scale bayesian deep learning. In Artificial Intelligence and Statistics (AISTATS), 2020.
- Improved uncertainty quantification for neural networks with bayesian last layer. arXiv preprint arXiv:2302.10975, 2023.
- ‘in-between’ uncertainty in bayesian neural networks. arXiv preprint arXiv:1906.11537, 2019.
- Vincent Fortuin. Priors in bayesian deep learning: A review. International Statistical Review, 2022.
- Bayesian neural network priors revisited. arXiv:2102.06571, 2021.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML), 2016.
- Seymour Geisser. Bayesian estimation in multivariate analysis. The Annals of Mathematical Statistics, 1965.
- Covariances, robustness and variational bayes. Journal of Machine Learning Research, 2018.
- Your classifier is secretly an energy based model and you should treat it like one. In International Conference on Learning Representations (ICLR), 2020.
- Inconsistency of bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Analysis, 2017.
- James Harrison. Uncertainty and efficiency in adaptive robot learning and control. PhD thesis, Stanford University, 2021.
- Meta-learning priors for efficient online Bayesian regression. Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018.
- Continuous meta-learning without tasks. Neural Information Processing Systems (NeurIPS), 2020.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv:1610.02136, 2016.
- Gaussian processes for big data. Uncertainty in Artificial Intelligence (UAI), 2013.
- Stochastic variational inference. Journal of Machine Learning Research, 2013.
- What are bayesian neural network posteriors really like? In International Conference on Machine Learning (ICML), 2021.
- David Jacobson. Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Transactions on Automatic Control, 1973.
- On uncertainty, tempering, and data augmentation in bayesian classification. In Neural Information Processing Systems (NeurIPS), 2022.
- Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR), 2015.
- Auto-encoding variational Bayes. International Conference on Learning Representations (ICLR), 2014.
- Generalized variational inference: Three arguments for deriving new posteriors. arXiv preprint arXiv:1904.02063, 2019.
- Non-conjugate variational message passing for multinomial and binary regression. In Neural Information Processing Systems (NeurIPS), 2011.
- Being Bayesian, even just a bit, fixes overconfidence in relu networks. In International Conference on Machine Learning (ICML), 2020.
- Learnable uncertainty under laplace approximations. In Uncertainty in Artificial Intelligence (UAI), 2021.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Neural Information Processing Systems (NeurIPS), 2017.
- When gaussian process meets big data: A review of scalable gps. IEEE transactions on neural networks and learning systems, 2020.
- A simple approach to improve single-model deep uncertainty via distance-awareness. Journal of Machine Learning Research, 2022.
- Decoupled weight decay regularization. arXiv:1711.05101, 2017.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
- David JC MacKay. A practical bayesian framework for backpropagation networks. Neural Computation, 1992.
- A simple baseline for bayesian uncertainty in deep learning. In Neural Information Processing Systems (NeurIPS), volume 32, 2019.
- Bayesian linear regression on deep representations. arXiv:1912.06760, 2019.
- Monte carlo gradient estimation in machine learning. Journal of Machine Learning Research, 2020.
- Eric Thomas Nalisnick. On priors for Bayesian neural networks. University of California, Irvine, 2018.
- Radford M Neal. Bayesian Learning for Neural Networks. PhD thesis, University of Toronto, 1995.
- Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. URL http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf.
- Benchmarking the neural linear model for regression. arXiv preprint arXiv:1912.08416, 2019.
- The promises and pitfalls of deep kernel learning. In Uncertainty in Artificial Intelligence (UAI), 2021.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In Neural Information Processing Systems (NeurIPS), 2019.
- Challenges in markov chain monte carlo for bayesian neural networks. Statistical Science, 2022.
- Expressive priors in Bayesian neural networks: Kernel combinations and periodic functions. In Uncertainty in Artificial Intelligence (UAI), 2020.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 2011.
- C Radhakrishna Rao. Information and the accuracy attainable in the estimation of statistical parameters. In Breakthroughs in statistics. Springer, 1992.
- Carl Edward Rasmussen. Gaussian processes in machine learning. Springer, 2004.
- Likelihood ratios for out-of-distribution detection. In Neural Information Processing Systems (NeurIPS), 2019.
- A simple fix to mahalanobis distance for improving near-ood detection. arXiv:2106.09022, 2021.
- Deep Bayesian bandits showdown. In International Conference on Learning Representations (ICLR), 2018.
- A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 2018.
- Recursive noise adaptive kalman filtering by variational bayesian approximations. IEEE Transactions on Automatic Control, 2009.
- Last layer marginal likelihood for invariance learning. In Artificial Intelligence and Statistics (AISTATS), 2022.
- Do Bayesian neural networks need to be fully stochastic? arXiv preprint arXiv:2211.06291, 2022.
- Prototypical networks for few-shot learning. Neural Information Processing Systems (NeurIPS), 2017.
- Scalable Bayesian optimization using deep neural networks. International Conference on Machine Learning (ICML), 2015.
- Functional variational bayesian neural networks. arXiv:1903.05779, 2019.
- Adaptive classification by variational kalman filtering. In Neural Information Processing Systems (NeurIPS), 2002.
- A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In Neural Information Processing Systems (NeurIPS), 2006.
- Uncertainty-aware (una) bases for Bayesian regression using multi-headed auxiliary networks. arXiv preprint arXiv:2006.11695, 2020.
- William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933.
- On the bayesian estimation of multivariate regression. Journal of the Royal Statistical Society: Series B (Methodological), 1964.
- A kalman filter for robust outlier detection. In IEEE International Conference on Intelligent Robots and Systems (IROS), 2007.
- Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial Intelligence and Statistics (AISTATS), 2009.
- Uncertainty estimation using a single deep deterministic neural network. In International Conference on Machine Learning (ICML), 2020.
- Variational bayes under model misspecification. Neural Information Processing Systems (NeurIPS), 2019.
- Neural linear models with functional gaussian process priors. In Advances in Approximate Bayesian Inference (AABI), 2020.
- Latent derivative Bayesian last layer networks. In Artificial Intelligence and Statistics (AISTATS), 2021.
- Optimizing over a Bayesian last layer. In NeurIPS workshop on Bayesian Deep Learning, 2018.
- How good is the bayes posterior in deep neural networks really? In International Conference on Machine Learning (ICML), 2020.
- Bayesian embeddings for few-shot open world recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2022.
- Bayesian deep learning and a probabilistic perspective of generalization. In Neural Information Processing Systems (NeurIPS), volume 33, 2020.
- Stochastic variational deep kernel learning. In Neural Information Processing Systems (NeurIPS), 2016a.
- Deep kernel learning. In Artificial Intelligence and Statistics (AISTATS), 2016b.
- Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Shallow Bayesian meta learning for real-world few-shot recognition. In International Conference on Computer Vision (ECCV), 2021.