Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalized Laplace Approximation (2405.13535v3)

Published 22 May 2024 in cs.LG and stat.ML

Abstract: In recent years, the inconsistency in Bayesian deep learning has garnered increasing attention. Tempered or generalized posterior distributions often offer a direct and effective solution to this issue. However, understanding the underlying causes and evaluating the effectiveness of generalized posteriors remain active areas of research. In this study, we introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors. We interpret the generalization of the posterior with a temperature factor as a correction for misspecified models through adjustments to the joint probability model, and the recalibration of priors by redistributing probability mass on models within the hypothesis space using data samples. Additionally, we highlight a distinctive feature of Laplace approximation, which ensures that the generalized normalizing constant can be treated as invariant, unlike the typical scenario in general Bayesian learning where this constant varies with model parameters post-generalization. Building on this insight, we propose the generalized Laplace approximation, which involves a simple adjustment to the computation of the Hessian matrix of the regularized loss function. This method offers a flexible and scalable framework for obtaining high-quality posterior distributions. We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Laurence Aitchison. A statistical theory of cold posteriors in deep neural networks. arXiv preprint arXiv:2008.05912, 2020.
  2. Learning and generalization in overparameterized neural networks, going beyond two layers. Advances in neural information processing systems, 32, 2019.
  3. Minimum complexity density estimation. IEEE transactions on information theory, 37(4):1034–1054, 1991.
  4. Local rademacher complexities. 2005.
  5. A general framework for updating belief distributions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):1103–1130, 2016.
  6. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
  7. Sparse subnetwork inference for neural network epistemic uncertainty estimation with improved hessian approximation. APL Machine Learning, 2(2), 2024.
  8. Deep learning for classical japanese literature. arXiv preprint arXiv:1812.01718, 2018.
  9. Uncertainty-based out-of-distribution detection requires suitable function space priors. 2021.
  10. Laplace redux-effortless bayesian deep learning. Advances in Neural Information Processing Systems, 34:20089–20103, 2021.
  11. Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054, 2018.
  12. Catching up faster in bayesian model selection and model averaging. Advances in Neural Information Processing Systems, 20, 2007.
  13. Fast approximate natural gradient descent in a kronecker factored eigenbasis. Advances in Neural Information Processing Systems, 31, 2018.
  14. Peter Grünwald. The safe bayesian: learning the learning rate via the mixability gap. In International Conference on Algorithmic Learning Theory, pages 169–183. Springer, 2012.
  15. Suboptimal behavior of bayes and mdl in classification under misspecification. Machine Learning, 66:119–149, 2007.
  16. Peter Grünwald and Thijs van Ommen. Inconsistency of bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Analysis, 12(4):1069–1103, 2017.
  17. Fast rates for general unbounded loss functions: from erm to generalized bayes. Journal of Machine Learning Research, 21(56):1–80, 2020.
  18. On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017.
  19. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  20. Bayesian inference for large scale image classification. arXiv preprint arXiv:1908.03491, 2019.
  21. Safe-bayesian generalized linear regression. In International Conference on Artificial Intelligence and Statistics, pages 2623–2633. PMLR, 2020.
  22. Probabilistic backpropagation for scalable learning of bayesian neural networks. In International conference on machine learning, pages 1861–1869. PMLR, 2015.
  23. Assigning a value to a power likelihood in a general bayesian model. Biometrika, 104(2):497–503, 2017.
  24. Improving predictions of bayesian neural nets via local linearization. In International conference on artificial intelligence and statistics, pages 703–711. PMLR, 2021.
  25. Dangers of bayesian model averaging under covariate shift. Advances in Neural Information Processing Systems, 34:3309–3322, 2021.
  26. Hands-on bayesian neural networks—a tutorial for deep learning users. IEEE Computational Intelligence Magazine, 17(2):29–48, 2022.
  27. Learning multiple layers of features from tiny images. 2009.
  28. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  29. Estimating model uncertainty of neural networks in sparse information form. In International Conference on Machine Learning, pages 5702–5713. PMLR, 2020.
  30. Partitioned integrators for thermodynamic parameterization of neural networks. Foundations of Data Science, 1(4):457–489, 2019.
  31. Learning overparameterized neural networks via stochastic gradient descent on structured data. Advances in neural information processing systems, 31, 2018.
  32. David MacKay. Bayesian model comparison and backprop nets. Advances in neural information processing systems, 4, 1991.
  33. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pages 2408–2417. PMLR, 2015.
  34. Ulrich K Müller. Risk of bayesian inference in misspecified models, and the sandwich covariance matrix. Econometrica, 81(5):1805–1849, 2013.
  35. Radford M Neal et al. Mcmc using hamiltonian dynamics. Handbook of markov chain monte carlo, 2(11):2, 2011.
  36. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 7. Granada, Spain, 2011.
  37. Practical deep learning with bayesian principles. Advances in neural information processing systems, 32, 2019.
  38. A scalable laplace approximation for neural networks. In 6th International Conference on Learning Representations, ICLR 2018-Conference Track Proceedings, volume 6. International Conference on Representation Learning, 2018.
  39. Calibrating general posterior credible regions. Biometrika, 106(2):479–486, 2019.
  40. Fast rates in statistical and online learning. The Journal of Machine Learning Research, 16(1):1793–1861, 2015.
  41. Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 2013.
  42. On bayesian consistency. Journal of the Royal Statistical Society Series B: Statistical Methodology, 63(4):811–821, 2001.
  43. How good is the bayes posterior in deep neural networks really? In Proceedings of the 37th International Conference on Machine Learning, pages 10248–10259, 2020.
  44. Using stacking to average bayesian predictive distributions (with discussion). Bayesian Analysis, 13(3), 2018.
  45. Noisy natural gradient as variational inference. In International conference on machine learning, pages 5852–5861. PMLR, 2018.
  46. Cyclical stochastic gradient mcmc for bayesian deep learning. In 8th International Conference on Learning Representations, ICLR 2020, 2020.
  47. Tong Zhang. From ε𝜀\varepsilonitalic_ε-entropy to kl-entropy: Analysis of minimum information complexity density estimation. The Annals of Statistics, pages 2180–2210, 2006.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets