Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Variational Linearized Laplace Approximation for Bayesian Deep Learning (2302.12565v3)

Published 24 Feb 2023 in stat.ML and cs.LG

Abstract: The Linearized Laplace Approximation (LLA) has been recently used to perform uncertainty estimation on the predictions of pre-trained deep neural networks (DNNs). However, its widespread application is hindered by significant computational costs, particularly in scenarios with a large number of training points or DNN parameters. Consequently, additional approximations of LLA, such as Kronecker-factored or diagonal approximate GGN matrices, are utilized, potentially compromising the model's performance. To address these challenges, we propose a new method for approximating LLA using a variational sparse Gaussian Process (GP). Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN. Furthermore, it allows for efficient stochastic optimization, which results in sub-linear training time in the size of the training dataset. Specifically, its training cost is independent of the number of training points. We compare our proposed method against accelerated LLA (ELLA), which relies on the Nystr\"om approximation, as well as other LLA variants employing the sample-then-optimize principle. Experimental results, both on regression and classification datasets, show that our method outperforms these already existing efficient variants of LLA, both in terms of the quality of the predictive distribution and in terms of total computational time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Sampling-based inference for large linear models, with application to linearised Laplace. In International Conference on Learning Representations, 2023.
  2. Riemannian laplace approximations for bayesian neural networks. arXiv preprint arXiv:2306.07158, 2023.
  3. Bishop, C. M. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 2006.
  4. Weight uncertainty in neural network. In International conference on machine learning, pp. 1613–1622. PMLR, 2015.
  5. Practical gauss-newton optimisation for deep learning. In International Conference on Machine Learning, pp. 557–565. PMLR, 2017.
  6. A unifying framework for gaussian process pseudo-point approximations using power expectation propagation. Journal of Machine Learning Research, 18:1–72, 2017.
  7. Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pp. 1683–1691. PMLR, 2014.
  8. Incremental variational sparse Gaussian process regression. Advances in Neural Information Processing Systems, 29, 2016.
  9. Variational inference for Gaussian process models with linear complexity. Advances in Neural Information Processing Systems, 30, 2017.
  10. Laplace Redux - Effortless Bayesian deep learning. Advances in Neural Information Processing Systems, 34:20089–20103, 2021a.
  11. Bayesian deep learning via subnetwork inference. In International Conference on Machine Learning, pp. 2510–2521, 2021b.
  12. Bayesadapter: Being bayesian, inexpensively and reliably, via bayesian fine-tuning. In Asian Conference on Machine Learning, pp.  280–295. PMLR, 2023.
  13. Accelerated linearized laplace approximation for bayesian deep learning. arXiv preprint arXiv:2210.12642, 2022.
  14. Sparse gaussian processes with spherical harmonic features. In International Conference on Machine Learning, pp. 2793–2802. PMLR, 2020.
  15. ’in-between’uncertainty in bayesian neural networks. arXiv preprint arXiv:1906.11537, 2019.
  16. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102:359–378, 2007.
  17. Graves, A. Practical variational inference for neural networks. Advances in neural information processing systems, 24, 2011.
  18. On calibration of modern neural networks. In International conference on machine learning, pp. 1321–1330. PMLR, 2017.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  20. Improving predictions of bayesian neural nets via local linearization. In International Conference on Artificial Intelligence and Statistics, pp.  703–711. PMLR, 2021.
  21. Subspace inference for bayesian deep learning. In Uncertainty in Artificial Intelligence, pp.  1169–1179. PMLR, 2020.
  22. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems, 30, 2017.
  23. Approximate inference turns deep networks into Gaussian processes. Advances in neural information processing systems, 32, 2019.
  24. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  25. Lawrence, N. D. Variational inference in probabilistic models. PhD thesis, Citeseer, 2001.
  26. Leveraging uncertainty information from deep neural networks for disease detection. Scientific reports, 7(1):1–14, 2017.
  27. Dropout inference in bayesian neural networks with alpha-divergences. In International Conference on Machine Learning, pp. 2052–2061, 2017.
  28. Sampling from gaussian process posteriors using stochastic gradient descent. arXiv preprint arXiv:2306.11589, 2023.
  29. MacKay, D. J. The evidence framework applied to classification networks. Neural computation, 4(5):720–736, 1992a.
  30. MacKay, D. J. A practical bayesian framework for backpropagation networks. Neural computation, 4(3):448–472, 1992b.
  31. Mackay, D. J. C. Bayesian methods for adaptive models. California Institute of Technology, 1992.
  32. Martens, J. New insights and perspectives on the natural gradient method. The Journal of Machine Learning Research, 21(1):5776–5851, 2020.
  33. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pp. 2408–2417. PMLR, 2015.
  34. Neal, R. M. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
  35. Fast finite width neural tangent kernel. In International Conference on Machine Learning, pp. 17018–17044, 2022.
  36. Deep variational implicit processes. In International Conference of Learning Representations, 2023.
  37. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift? Advances in neural information processing systems, pp. 13969–13980, 2019.
  38. A scalable laplace approximation for neural networks. In 6th International Conference on Learning Representations, ICLR 2018-Conference Track Proceedings, volume 6. International Conference on Representation Learning, 2018.
  39. Doubly stochastic variational inference for deep gaussian processes. Advances in neural information processing systems, 30, 2017.
  40. Adversarial α𝛼\alphaitalic_α-divergence minimization for bayesian approximate inference. Neurocomputing, 471:260–274, 2022.
  41. Titsias, M. Variational learning of inducing variables in sparse Gaussian processes. In Artificial intelligence and statistics, pp.  567–574. PMLR, 2009.
  42. Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008, 2017.
  43. Alpha divergence minimization in multi-class gaussian process classification. Neurocomputing, 378:210–227, 2020.
  44. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com