Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gibbs Sampling the Posterior of Neural Networks (2306.02729v2)

Published 5 Jun 2023 in cs.LG and stat.ML

Abstract: In this paper, we study sampling from a posterior derived from a neural network. We propose a new probabilistic model consisting of adding noise at every pre- and post-activation in the network, arguing that the resulting posterior can be sampled using an efficient Gibbs sampler. For small models, the Gibbs sampler attains similar performances as the state-of-the-art Markov chain Monte Carlo (MCMC) methods, such as the Hamiltonian Monte Carlo (HMC) or the Metropolis adjusted Langevin algorithm (MALA), both on real and synthetic data. By framing our analysis in the teacher-student setting, we introduce a thermalization criterion that allows us to detect when an algorithm, when run on data with synthetic labels, fails to sample from the posterior. The criterion is based on the fact that in the teacher-student setting we can initialize an algorithm directly at equilibrium.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88(422):669–679, 1993.
  2. A theoretical framework for inference learning. arXiv preprint arXiv:2206.00164, 2022.
  3. Optimal errors and phase transitions in high-dimensional generalized linear models. Proceedings of the National Academy of Sciences, 116(12):5451–5460, 2019.
  4. Julian Besag. Comments on “representations of knowledge in complex systems” by u. grenander and mi miller. J. Roy. Statist. Soc. Ser. B, 56(591-592):4, 1994.
  5. General methods for monitoring convergence of iterative simulations. Journal of computational and graphical statistics, 7(4):434–455, 1998.
  6. Explaining the gibbs sampler. The American Statistician, 46(3):167–174, 1992.
  7. Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pages 1683–1691. PMLR, 2014.
  8. Scaling hamiltonian monte carlo inference for bayesian neural networks with symmetric splitting. Uncertainty in Artificial Intelligence, 2021.
  9. Markov chain monte carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association, 91(434):883–904, 1996.
  10. Approximating probabilistic inference in bayesian belief networks is np-hard. Artificial intelligence, 60(1):141–153, 1993.
  11. Tensorflow distributions, 2017.
  12. Hybrid monte carlo. Physics letters B, 195(2):216–222, 1987.
  13. Output assessment for monte carlo simulations via the score statistic. Journal of Computational and Graphical Statistics, 15(1):178–206, 2006.
  14. Data augmentation and mcmc for binary and multinomial logit models. In Statistical modelling and regression structures, pages 111–132. Springer, 2010.
  15. Inference from iterative simulation using multiple sequences. Statistical science, pages 457–472, 1992.
  16. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence, PAMI-6(6):721–741, 1984.
  17. Bayesian neural networks: An introduction and survey. In Case Studies in Applied Bayesian Data Science, pages 45–87. Springer, 2020.
  18. Measuring sample quality with stein’s method. Advances in Neural Information Processing Systems, 28, 2015.
  19. Measuring sample quality with kernels. In International Conference on Machine Learning, pages 1292–1301. PMLR, 2017.
  20. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian analysis, 1(1):145–168, 2006.
  21. The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623, 2014.
  22. What are bayesian neural network posteriors really like? In International conference on machine learning, pages 4629–4640. PMLR, 2021.
  23. Hands-on bayesian neural networks—a tutorial for deep learning users. IEEE Computational Intelligence Magazine, 17(2):29–48, 2022.
  24. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems, 30, 2017.
  25. Fast and scalable bayesian deep learning by weight-perturbation in adam. In International Conference on Machine Learning, pages 2611–2620. PMLR, 2018.
  26. Preconditioned stochastic gradient langevin dynamics for deep neural networks. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
  27. A complete recipe for stochastic gradient mcmc. Advances in neural information processing systems, 28, 2015.
  28. David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 4(3):448–472, 1992.
  29. A simple baseline for bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems, 32, 2019.
  30. Bayesian learning for neural networks: an algorithmic survey. arXiv preprint arXiv:2211.11865, 2022.
  31. Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953.
  32. Information, physics, and computation. Oxford University Press, 2009.
  33. Predictive coding: Towards a future of deep learning beyond backpropagation? arXiv preprint arXiv:2202.09467, 2022.
  34. Predictive coding: a theoretical and experimental review. arXiv preprint arXiv:2107.12979, 2021.
  35. Radford M Neal. Learning stochastic feedforward networks. Department of Computer Science, University of Toronto, 64(1283):1577, 1990.
  36. Radford M Neal. Connectionist learning of belief networks. Artificial intelligence, 56(1):71–113, 1992.
  37. Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
  38. Stochastic gradient markov chain monte carlo. Journal of the American Statistical Association, 116(533):433–450, 2021.
  39. Monte Carlo methods in statistical physics. Clarendon Press, 1999.
  40. Techniques for learning binary stochastic feedforward neural networks. arXiv preprint arXiv:1406.2989, 2014.
  41. Monte Carlo statistical methods, volume 2. Springer, 1999.
  42. Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996.
  43. Do bayesian neural networks need to be fully stochastic? arXiv preprint arXiv:2211.06291, 2022.
  44. Learning stochastic feedforward neural networks. Advances in Neural Information Processing Systems, 26, 2013.
  45. Bayesian uncertainty estimation for batch normalized deep networks. In International Conference on Machine Learning, pages 4907–4916. PMLR, 2018.
  46. Consistent inference of probabilities in layered networks: Predictions and generalization. In International Joint Conference on Neural Networks, volume 2, pages 403–409. IEEE New York, 1989.
  47. Natural-parameter networks: A class of probabilistic neural networks. Advances in neural information processing systems, 29, 2016.
  48. A survey on bayesian deep learning. ACM Computing Surveys (CSUR), 53(5):1–37, 2020.
  49. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688, 2011.
  50. How good is the bayes posterior in deep neural networks really? arXiv preprint arXiv:2002.02405, 2020.
  51. An approximation of the error backpropagation algorithm in a predictive coding network with local hebbian synaptic plasticity. Neural computation, 29(5):1229–1262, 2017.
  52. Bayesian deep learning and a probabilistic perspective of generalization. Advances in neural information processing systems, 33:4697–4708, 2020.
  53. Deterministic variational inference for robust bayesian neural networks. arXiv preprint arXiv:1810.03958, 2018.
  54. Simple and effective stochastic neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3252–3260, 2021.
  55. Statistical physics of inference: Thresholds and algorithms. Advances in Physics, 65(5):453–552, 2016.
  56. Gibbs sampler convergence criteria. Journal of the American Statistical Association, 90(431):921–927, 1995.
  57. Cyclical stochastic gradient mcmc for bayesian deep learning. arXiv preprint arXiv:1902.03932, 2019.

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com