Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Variational Bayesian Last Layers (2404.11599v1)

Published 17 Apr 2024 in cs.LG, cs.CV, and stat.ML

Abstract: We introduce a deterministic variational formulation for training Bayesian last layer neural networks. This yields a sampling-free, single-pass model and loss that effectively improves uncertainty estimation. Our variational Bayesian last layer (VBLL) can be trained and evaluated with only quadratic complexity in last layer width, and is thus (nearly) computationally free to add to standard architectures. We experimentally investigate VBLLs, and show that they improve predictive accuracy, calibration, and out of distribution detection over baselines across both regression and classification. Finally, we investigate combining VBLL layers with variational Bayesian feature learning, yielding a lower variance collapsed variational inference method for Bayesian neural networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. Laurence Aitchison. A statistical theory of cold posteriors in deep neural networks. arXiv preprint arXiv:2008.05912, 2020.
  2. Efficient exploration through Bayesian deep q-networks. arXiv:1802.04412, 2018.
  3. David Blackwell. Conditional expectation and unbiased sequential estimation. The Annals of Mathematical Statistics, 1947.
  4. A correlated topic model of science. The annals of applied statistics, 2007.
  5. Weight uncertainty in neural network. In International Conference on Machine Learning (ICML), 2015.
  6. Bayesian inference in statistical analysis, volume 40. John Wiley & Sons, 2011.
  7. Correlated input-dependent label noise in large-scale image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  8. Laplace redux-effortless Bayesian deep learning. Neural Information Processing Systems (NeurIPS), 2021a.
  9. Bayesian deep learning via subnetwork inference. In International Conference on Machine Learning (ICML), pp. 2510–2521. PMLR, 2021b.
  10. A comparison of variational approximations for fast inference in mixed logit models. Computational Statistics, 2017.
  11. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
  12. Efficient and scalable bayesian neural nets with rank-1 factors. In International Conference on Machine Learning (ICML), 2020.
  13. Radial bayesian neural networks: Beyond discrete support in large-scale bayesian deep learning. In Artificial Intelligence and Statistics (AISTATS), 2020.
  14. Improved uncertainty quantification for neural networks with bayesian last layer. arXiv preprint arXiv:2302.10975, 2023.
  15. ‘in-between’ uncertainty in bayesian neural networks. arXiv preprint arXiv:1906.11537, 2019.
  16. Vincent Fortuin. Priors in bayesian deep learning: A review. International Statistical Review, 2022.
  17. Bayesian neural network priors revisited. arXiv:2102.06571, 2021.
  18. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML), 2016.
  19. Seymour Geisser. Bayesian estimation in multivariate analysis. The Annals of Mathematical Statistics, 1965.
  20. Covariances, robustness and variational bayes. Journal of Machine Learning Research, 2018.
  21. Your classifier is secretly an energy based model and you should treat it like one. In International Conference on Learning Representations (ICLR), 2020.
  22. Inconsistency of bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Analysis, 2017.
  23. James Harrison. Uncertainty and efficiency in adaptive robot learning and control. PhD thesis, Stanford University, 2021.
  24. Meta-learning priors for efficient online Bayesian regression. Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018.
  25. Continuous meta-learning without tasks. Neural Information Processing Systems (NeurIPS), 2020.
  26. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv:1610.02136, 2016.
  27. Gaussian processes for big data. Uncertainty in Artificial Intelligence (UAI), 2013.
  28. Stochastic variational inference. Journal of Machine Learning Research, 2013.
  29. What are bayesian neural network posteriors really like? In International Conference on Machine Learning (ICML), 2021.
  30. David Jacobson. Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Transactions on Automatic Control, 1973.
  31. On uncertainty, tempering, and data augmentation in bayesian classification. In Neural Information Processing Systems (NeurIPS), 2022.
  32. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR), 2015.
  33. Auto-encoding variational Bayes. International Conference on Learning Representations (ICLR), 2014.
  34. Generalized variational inference: Three arguments for deriving new posteriors. arXiv preprint arXiv:1904.02063, 2019.
  35. Non-conjugate variational message passing for multinomial and binary regression. In Neural Information Processing Systems (NeurIPS), 2011.
  36. Being Bayesian, even just a bit, fixes overconfidence in relu networks. In International Conference on Machine Learning (ICML), 2020.
  37. Learnable uncertainty under laplace approximations. In Uncertainty in Artificial Intelligence (UAI), 2021.
  38. Simple and scalable predictive uncertainty estimation using deep ensembles. Neural Information Processing Systems (NeurIPS), 2017.
  39. When gaussian process meets big data: A review of scalable gps. IEEE transactions on neural networks and learning systems, 2020.
  40. A simple approach to improve single-model deep uncertainty via distance-awareness. Journal of Machine Learning Research, 2022.
  41. Decoupled weight decay regularization. arXiv:1711.05101, 2017.
  42. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.  142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
  43. David JC MacKay. A practical bayesian framework for backpropagation networks. Neural Computation, 1992.
  44. A simple baseline for bayesian uncertainty in deep learning. In Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  45. Bayesian linear regression on deep representations. arXiv:1912.06760, 2019.
  46. Monte carlo gradient estimation in machine learning. Journal of Machine Learning Research, 2020.
  47. Eric Thomas Nalisnick. On priors for Bayesian neural networks. University of California, Irvine, 2018.
  48. Radford M Neal. Bayesian Learning for Neural Networks. PhD thesis, University of Toronto, 1995.
  49. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. URL http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf.
  50. Benchmarking the neural linear model for regression. arXiv preprint arXiv:1912.08416, 2019.
  51. The promises and pitfalls of deep kernel learning. In Uncertainty in Artificial Intelligence (UAI), 2021.
  52. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In Neural Information Processing Systems (NeurIPS), 2019.
  53. Challenges in markov chain monte carlo for bayesian neural networks. Statistical Science, 2022.
  54. Expressive priors in Bayesian neural networks: Kernel combinations and periodic functions. In Uncertainty in Artificial Intelligence (UAI), 2020.
  55. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 2011.
  56. C Radhakrishna Rao. Information and the accuracy attainable in the estimation of statistical parameters. In Breakthroughs in statistics. Springer, 1992.
  57. Carl Edward Rasmussen. Gaussian processes in machine learning. Springer, 2004.
  58. Likelihood ratios for out-of-distribution detection. In Neural Information Processing Systems (NeurIPS), 2019.
  59. A simple fix to mahalanobis distance for improving near-ood detection. arXiv:2106.09022, 2021.
  60. Deep Bayesian bandits showdown. In International Conference on Learning Representations (ICLR), 2018.
  61. A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 2018.
  62. Recursive noise adaptive kalman filtering by variational bayesian approximations. IEEE Transactions on Automatic Control, 2009.
  63. Last layer marginal likelihood for invariance learning. In Artificial Intelligence and Statistics (AISTATS), 2022.
  64. Do Bayesian neural networks need to be fully stochastic? arXiv preprint arXiv:2211.06291, 2022.
  65. Prototypical networks for few-shot learning. Neural Information Processing Systems (NeurIPS), 2017.
  66. Scalable Bayesian optimization using deep neural networks. International Conference on Machine Learning (ICML), 2015.
  67. Functional variational bayesian neural networks. arXiv:1903.05779, 2019.
  68. Adaptive classification by variational kalman filtering. In Neural Information Processing Systems (NeurIPS), 2002.
  69. A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In Neural Information Processing Systems (NeurIPS), 2006.
  70. Uncertainty-aware (una) bases for Bayesian regression using multi-headed auxiliary networks. arXiv preprint arXiv:2006.11695, 2020.
  71. William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933.
  72. On the bayesian estimation of multivariate regression. Journal of the Royal Statistical Society: Series B (Methodological), 1964.
  73. A kalman filter for robust outlier detection. In IEEE International Conference on Intelligent Robots and Systems (IROS), 2007.
  74. Michalis Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial Intelligence and Statistics (AISTATS), 2009.
  75. Uncertainty estimation using a single deep deterministic neural network. In International Conference on Machine Learning (ICML), 2020.
  76. Variational bayes under model misspecification. Neural Information Processing Systems (NeurIPS), 2019.
  77. Neural linear models with functional gaussian process priors. In Advances in Approximate Bayesian Inference (AABI), 2020.
  78. Latent derivative Bayesian last layer networks. In Artificial Intelligence and Statistics (AISTATS), 2021.
  79. Optimizing over a Bayesian last layer. In NeurIPS workshop on Bayesian Deep Learning, 2018.
  80. How good is the bayes posterior in deep neural networks really? In International Conference on Machine Learning (ICML), 2020.
  81. Bayesian embeddings for few-shot open world recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2022.
  82. Bayesian deep learning and a probabilistic perspective of generalization. In Neural Information Processing Systems (NeurIPS), volume 33, 2020.
  83. Stochastic variational deep kernel learning. In Neural Information Processing Systems (NeurIPS), 2016a.
  84. Deep kernel learning. In Artificial Intelligence and Statistics (AISTATS), 2016b.
  85. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  86. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  87. Shallow Bayesian meta learning for real-world few-shot recognition. In International Conference on Computer Vision (ECCV), 2021.
Citations (13)

Summary

  • The paper introduces VBLL, a variational Bayesian method that achieves efficient uncertainty estimation in neural networks through a deterministic, sampling-free formulation.
  • It demonstrates improved predictive accuracy, calibration, and out-of-distribution detection while maintaining quadratic complexity in the last layer’s width.
  • The accessible PyTorch implementation and solid theoretical foundation pave the way for integrating VBLLs into existing deep learning pipelines and future research.

Variational Bayesian Last Layers: Efficient Uncertainty Estimation in Neural Networks

Introduction to Bayesian Last Layers

Bayesian Last Layers (BLLs) have garnered interest due to their capacity to provide uncertainty estimation in neural network predictions without the substantial computational overhead typically associated with Bayesian methods. This paper presents a variational approach to Bayesian last layers (VBLL), which incorporates uncertainty quantification seamlessly into standard neural network frameworks. The proposed deterministic variational formulation facilitates training and evaluation with only quadratic complexity in the width of the last layer, rendering it nearly computationally free.

Novel Contributions of the VBLL Methodology

The core contributions of the introduced VBLL methodology include:

  • Implementation of variational Bayesian last layers (VBLLs) that integrate easily into existing neural network architectures and training pipelines, enhancing both deterministic and Bayesian models.
  • Development of principled, sampling-free Bayesian training objectives for VBLLs that ensure computational efficiency on par with standard training regimes.
  • Demonstrated improvements in predictive accuracy, likelihood estimates, calibration, and out-of-distribution detection across various settings through empirical evaluations.
  • Creation of an accessible implementation of VBLLs in PyTorch, designed for ease of use and integration into existing projects.

Key Technical Insights and Theoretical Foundations

Efficient Uncertainty Quantification

The paper elaborates on VBLLs' efficient handling of uncertainty through direct variational inference, eschewing the need for computationally expensive sampling methods. This efficiency is achieved by optimally leveraging the deterministic lower bounds on the marginal likelihood, which simplifies the training process significantly compared to traditional Bayesian approaches.

Inferential Rigor with Theoretical Support

The authors offer a rigorous theoretical analysis supporting the implementation of VBLLs. Derived lower bounds on the marginal likelihood underpin the training objectives for regression and classification tasks, ensuring that these objectives are not only theoretically sound but also practical for real-world applications.

Implications and Theoretical Contributions

Practical Applicability

The method's practicality is twofold: it integrates with existing deep learning pipelines without substantial modification, and it provides a computationally feasible approach to uncertainty quantification which is scalable to large models and datasets.

Theoretical Impact

Theoretically, VBLLs contribute to a deeper understanding of how variational methods can be employed to enhance neural networks' uncertainty quantification capabilities. This work extends the Bayesian neural network literature by providing a scalable, efficient solution to a problem traditionally hindered by computational complexity.

Future Directions in Variational Bayesian Learning

Looking forward, the VBLL framework sets a foundational basis for future explorations into more complex models and broader applications. Potential research directions could involve refining the variational approaches to further reduce computational overhead or exploring the integration of VBLLs into other forms of deep learning architectures beyond the typical feedforward networks studied here. Additionally, expanding the theoretical foundations to encompass broader classes of distributions or more complex data structures could significantly widen the applicability and impact of VBLLs in the field of machine learning.

Summary

This paper's contributions offer significant practical tools and theoretical insights for integrating Bayesian principles into modern neural networks efficiently. The VBLL framework not only advances the field of Bayesian deep learning by making it accessible and practical but also opens up new avenues for research into efficient, scalable methods for uncertainty quantification in AI.

Github Logo Streamline Icon: https://streamlinehq.com