Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Posterior concentrations of fully-connected Bayesian neural networks with general priors on the weights (2403.14225v1)

Published 21 Mar 2024 in stat.ML and cs.LG

Abstract: Bayesian approaches for training deep neural networks (BNNs) have received significant interest and have been effectively utilized in a wide range of applications. There have been several studies on the properties of posterior concentrations of BNNs. However, most of these studies only demonstrate results in BNN models with sparse or heavy-tailed priors. Surprisingly, no theoretical results currently exist for BNNs using Gaussian priors, which are the most commonly used one. The lack of theory arises from the absence of approximation results of Deep Neural Networks (DNNs) that are non-sparse and have bounded parameters. In this paper, we present a new approximation theory for non-sparse DNNs with bounded parameters. Additionally, based on the approximation theory, we show that BNNs with non-sparse general priors can achieve near-minimax optimal posterior concentration rates to the true model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Efficient variational inference for sparse deep learning with theoretical guarantee. Advances in Neural Information Processing Systems, 33:466–476, 2020.
  2. Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3):930–945, 1993.
  3. Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks. The Journal of Machine Learning Research, 20(1):2285–2301, 2019.
  4. On deep learning as a remedy for the curse of dimensionality in nonparametric regression. The Annals of Statistics, 47(4):2261, 2019.
  5. Minimal-uncertainty prediction of general drug-likeness based on bayesian neural networks. Nature Machine Intelligence, 2(8):457–465, 2020.
  6. Weight uncertainty in neural network. In International Conference on Machine Learning, pages 1613–1622. PMLR, 2015.
  7. Nonparametric regression on low-dimensional manifolds using deep relu networks: Function approximation and statistical recovery. Information and Inference: A Journal of the IMA, 11(4):1203–1253, 2022.
  8. Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pages 1683–1691. PMLR, 2014.
  9. Badr-Eddine Chérief-Abdellatif. Convergence rates of variational inference in sparse deep learning. In International Conference on Machine Learning, pages 1831–1842. PMLR, 2020.
  10. A bayesian neural network predicts the dissolution of compact planetary systems. Proceedings of the National Academy of Sciences, 118(40), 2021.
  11. Why relu units sometimes die: analysis of single-unit error backpropagation in neural networks. In 2018 52nd Asilomar Conference on Signals, Systems, and Computers, pages 864–868. IEEE, 2018.
  12. Efficient and scalable bayesian neural nets with rank-1 factors. In International conference on machine learning, pages 2782–2792. PMLR, 2020.
  13. The power of depth for feedforward neural networks. In Conference on learning theory, pages 907–940. PMLR, 2016.
  14. Radial bayesian neural networks: Beyond discrete support in large-scale bayesian deep learning. In International Conference on Artificial Intelligence and Statistics, pages 1352–1362. PMLR, 2020.
  15. Vincent Fortuin. Priors in bayesian deep learning: A review. International Statistical Review, 90(3):563–591, 2022.
  16. Bayesian neural network priors revisited. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=xkjqJYqRJy.
  17. Subhashis Ghosal and Aad van der Vaart. Convergence rates of posterior distributions for noniid observations. The Annals of Statistics, 35(1):192–223, 2007.
  18. Subhashis Ghosal and Aad van der Vaart. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
  19. Convergence rates of posterior distributions. Annals of Statistics, pages 500–531, 2000.
  20. Model selection in bayesian neural networks via horseshoe priors. J. Mach. Learn. Res., 20(182):1–46, 2019.
  21. Deep learning. MIT press, 2016.
  22. Alex Graves. Practical variational inference for neural networks. Advances in neural information processing systems, 24, 2011.
  23. Peter J Green. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, 82(4):711–732, 1995.
  24. A distribution-free theory of nonparametric regression, volume 1. Springer, 2002.
  25. Bayesian inference for large scale image classification. arXiv preprint arXiv:1908.03491, 2019.
  26. Probabilistic backpropagation for scalable learning of bayesian neural networks. In International conference on machine learning, pages 1861–1869. PMLR, 2015.
  27. Deep neural networks learn non-smooth functions effectively. In The 22nd international conference on artificial intelligence and statistics, pages 869–878. PMLR, 2019.
  28. What are bayesian neural network posteriors really like? In International conference on machine learning, pages 4629–4640. PMLR, 2021.
  29. Layer adaptive node selection in bayesian neural networks: Statistical guarantees and implementation details. Neural Networks, 167:309–330, 2023.
  30. Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors. The Annals of Statistics, 51(2):691–716, 2023.
  31. Hands-on bayesian neural networks—a tutorial for deep learning users. IEEE Computational Intelligence Magazine, 17(2):29–48, 2022.
  32. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems, 30, 2017.
  33. On the rate of convergence of fully connected deep neural network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021a.
  34. Supplement A to “On the rate of convergence of fully connected deep neural network regression estimates.”. https://projecteuclid.org/journals/supplementalcontent/10.1214/20-AOS2034/aos2034suppa.pdf, 2021b.
  35. Supplement B to “On the rate of convergence of fully connected deep neural network regression estimates.”. https://projecteuclid.org/journals/supplementalcontent/10.1214/20-AOS2034/aos2034suppb.pdf, 2021c.
  36. Estimation of a function of low local dimensionality by deep neural networks. IEEE transactions on information theory, 68(6):4032–4042, 2022.
  37. Masked Bayesian neural networks : Theoretical guarantee and its posterior inference. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17462–17491, 2023.
  38. Asymptotic properties for bayesian neural network in besov space. Advances in Neural Information Processing Systems, 35:5641–5653, 2022.
  39. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks, 6(6):861–867, 1993.
  40. Preconditioned stochastic gradient langevin dynamics for deep neural networks. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
  41. Jeremiah Liu. Variable selection with rigorous uncertainty quantification using deep bayesian neural networks: Posterior concentration and bernstein-von mises phenomenon. In International Conference on Artificial Intelligence and Statistics, pages 3124–3132. PMLR, 2021.
  42. Multiplicative normalizing flows for variational bayesian neural networks. In International Conference on Machine Learning, pages 2218–2227. PMLR, 2017.
  43. Bayesian compression for deep learning. Advances in neural information processing systems, 30, 2017.
  44. Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021.
  45. Dying relu and initialization: Theory and numerical examples. COMMUNICATIONS IN COMPUTATIONAL PHYSICS, 28(5):1671–1706, 2020.
  46. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3. Atlanta, GA, 2013.
  47. David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 4(3):448–472, 1992.
  48. On the number of linear regions of deep neural networks. Advances in Neural Information Processing Systems, 27:2924–2932, 2014.
  49. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010.
  50. Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. Journal of Machine Learning Research, 21(174):1–38, 2020.
  51. Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
  52. Variational continual learning. In International Conference on Learning Representations, 2018.
  53. Global inducing point variational posteriors for bayesian neural networks and deep gaussian processes. In International Conference on Machine Learning, pages 8248–8259. PMLR, 2021.
  54. Radial and directional posteriors for bayesian neural networks. AAAI Conference on Artificial Intelligence, 2020.
  55. Smooth function approximation by deep neural networks with general activation functions. Entropy, 21(7):627, 2019.
  56. Nonconvex sparse regularization for deep neural networks and its optimality. Neural computation, 34(2):476–517, 2022.
  57. Adaptive variational bayes: Optimality, computation and applications. The Annals of Statistics, 52(1):335–363, 2024.
  58. Optimal approximation of piecewise smooth functions using deep relu neural networks. Neural Networks, 108:296–330, 2018.
  59. Posterior concentration for sparse deep learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 938–949, 2018.
  60. Asymptotic frequentist coverage properties of bayesian credible sets for sieve priors. The Annals of Statistics, 48(4):2155–2179, 2020.
  61. Johannes Schmidt-Hieber. Deep relu network approximation of functions on a manifold. arXiv preprint arXiv:1908.00695, 2019.
  62. Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics, 48(4):1875–1897, 2020.
  63. Halo: Learning to prune neural networks with shrinkage. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), pages 558–566. SIAM, 2021.
  64. Easy bayesian transfer learning with informative priors. In Neural Information Processing Systems, 2022.
  65. Consistent sparse deep learning: Theory and computation. Journal of the American Statistical Association, 117(540):1981–1995, 2022.
  66. Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. In International Conference on Learning Representations, 2018.
  67. The k-tied normal distribution: A compact parameterization of gaussian mean field posteriors in bayesian neural networks. In International Conference on Machine Learning, pages 9289–9299. PMLR, 2020.
  68. Frequentist coverage of adaptive nonparametric bayesian credible sets. The Annals of Statistics, 43(4):1391–1428, 2015.
  69. Bayesian generative active deep learning. In International Conference on Machine Learning, pages 6295–6304. PMLR, 2019.
  70. Alexandre B Tsybakov. Introduction to nonparametric estimation, 2009.
  71. Aad W van der Vaart and J Harry van Zanten. Rates of contraction of posterior distributions based on gaussian process priors. The Annals of Statistics, 36(3):1435–1463, 2008.
  72. A survey on bayesian deep learning. ACM computing surveys (csur), 53(5):1–37, 2020.
  73. Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 1235–1244, 2015.
  74. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011.
  75. Peter M Williams. Bayesian regularization and pruning using a laplace prior. Neural computation, 7(1):117–143, 1995.
  76. Bayesian deep learning and a probabilistic perspective of generalization. Advances in neural information processing systems, 33:4697–4708, 2020.
  77. Deterministic variational inference for robust bayesian neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1l08oAct7.
  78. Adaptive bayesian nonparametric regression using a kernel mixture of polynomials with application to partial linear models. Bayesian Analysis, 15(1):159–186, 2020.
  79. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.
  80. Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
  81. Cyclical stochastic gradient mcmc for bayesian deep learning. In International Conference on Learning Representations, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Insung Kong (12 papers)
  2. Yongdai Kim (31 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com