Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Wide Deep Neural Networks with Gaussian Weights are Very Close to Gaussian Processes (2312.11737v1)

Published 18 Dec 2023 in math.ST, math.PR, stat.ML, and stat.TH

Abstract: We establish novel rates for the Gaussian approximation of random deep neural networks with Gaussian parameters (weights and biases) and Lipschitz activation functions, in the wide limit. Our bounds apply for the joint output of a network evaluated any finite input set, provided a certain non-degeneracy condition of the infinite-width covariances holds. We demonstrate that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive heuristic suggested by the central limit theorem. We also apply our bounds to obtain theoretical approximations for the exact Bayesian posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set. This includes popular cases such as the Gaussian likelihood, i.e. exponential of minus the mean squared error.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Luigi Ambrosio, Nicola Gigli and Giuseppe Savaré “Gradient flows: in metric spaces and in the space of probability measures” Springer Science & Business Media, 2005
  2. “Normal approximation of random gaussian neural networks” In arXiv preprint arXiv:2307.04486, 2023
  3. Luigi Ambrosio, Federico Stra and Dario Trevisan “A PDE approach to a 2-dimensional matching problem” In Probability Theory and Related Fields 173 Springer, 2019, pp. 433–477
  4. “Gaussian random field approximation via Stein’s method with applications to wide random neural networks” In arXiv preprint arXiv:2306.16308, 2023
  5. Thomas Bonis “Stein’s method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem” In Probab. Theory Relat. Fields 178.3, 2020, pp. 827–860 DOI: 10.1007/s00440-020-00989-4
  6. “Large-width functional asymptotics for deep Gaussian neural networks”, 2020 URL: https://openreview.net/forum?id=0aW6lYOYB7d
  7. “On the Wasserstein distance between classical sequences and the Lebesgue measure” In Transactions of the American Mathematical Society 373.12, 2020, pp. 8943–8962
  8. “Quantitative Gaussian approximation of randomly initialized deep neural networks” In arXiv preprint arXiv:2203.07379, 2022
  9. “A quantitative functional central limit theorem for shallow neural networks” In Modern Stochastics: Theory and Applications VTeX: Solutions for Science Publishing, 2023, pp. 1–24
  10. “A review on neural networks with random weights” In Neurocomputing 275 Elsevier, 2018, pp. 278–287
  11. Mark K. Cowan “battlesnake/neural” original-date: 2013-08-12T23:46:47Z, 2022 URL: https://github.com/battlesnake/neural
  12. “Kernel methods for deep learning” In Advances in neural information processing systems 22, 2009
  13. Andrea Carbonaro, Luca Tamanini and Dario Trevisan “Boundedness of Riesz transforms on RCD⁡(K,∞)RCD𝐾\operatorname{RCD}(K,\infty)roman_RCD ( italic_K , ∞ ) spaces” In arXiv preprint arXiv:2308.16294, 2023
  14. Ronen Eldan, Dan Mikulincer and Tselil Schramm “Non-asymptotic approximations of neural networks by Gaussian processes” In Conference on Learning Theory, 2021, pp. 1754–1775 PMLR
  15. “Quantitative clts in deep neural networks” In arXiv preprint arXiv:2307.06092, 2023
  16. Ian Goodfellow, Yoshua Bengio and Aaron Courville “Deep Learning” http://www.deeplearningbook.org MIT Press, 2016
  17. “Generative Adversarial Nets” In Advances in Neural Information Processing Systems 27 Curran Associates, Inc., 2014 URL: https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html
  18. “Convergence of asymptotic costs for random Euclidean matching problems” In Probability and Mathematical Physics 2.2 Mathematical Sciences Publishers, 2021, pp. 341–362
  19. J Leo Hemmen and Tsuneya Ando “An inequality for trace ideals” In Communications in Mathematical Physics 76.2 Springer, 1980, pp. 143–148
  20. Boris Hanin “Random neural networks in the infinite width limit as Gaussian processes” In The Annals of Applied Probability 33.6A Institute of Mathematical Statistics, 2023, pp. 4798–4819
  21. “Exact posterior distributions of wide Bayesian neural networks”, 2020 URL: https://openreview.net/forum?id=s279nFs8NNT
  22. Adam Klukowski “Rate of convergence of polynomial networks to Gaussian processes” In Conference on Learning Theory, 2022, pp. 701–722 PMLR
  23. Yann LeCun, Yoshua Bengio and Geoffrey Hinton “Deep learning” Number: 7553 Publisher: Nature Publishing Group In Nature 521.7553, 2015, pp. 436–444 DOI: 10.1038/nature14539
  24. Michel Ledoux “On Optimal Matching of Gaussian Samples.” In Journal of Mathematical Sciences 238.4, 2019
  25. “Deep Neural Networks as Gaussian Processes”, 2018 URL: https://openreview.net/forum?id=B1EA-M-0Z
  26. “Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019 URL: https://proceedings.neurips.cc/paper/2019/hash/0d1a9651497a38d8b1c3871c84528bd4-Abstract.html
  27. “Gaussian Process Behaviour in Wide Deep Neural Networks”, 2018 URL: https://openreview.net/forum?id=H1-nGgWC-
  28. Song Mei, Andrea Montanari and Phan-Minh Nguyen “A mean field view of the landscape of two-layer neural networks” In Proceedings of the National Academy of Sciences 115.33, 2018, pp. E7665–E7671 DOI: 10.1073/pnas.1806579115
  29. Meenal V Narkhede, Prashant P Bartakke and Mukul S Sutaone “A review on weight initialization strategies for neural networks” In Artificial intelligence review 55.1 Springer, 2022, pp. 291–322
  30. Radford M. Neal “Priors for Infinite Networks” In Bayesian Learning for Neural Networks, Lecture Notes in Statistics New York, NY: Springer, 1996, pp. 29–53 DOI: 10.1007/978-1-4612-0745-0˙2
  31. Phan-Minh Nguyen and Huy Tuan Pham “A rigorous framework for the mean field limit of multilayer neural networks” In Mathematical Statistics and Learning 6.3, 2023, pp. 201–357
  32. ADAM Osekowski “A note on Burkholder-Rosenthal inequality” In Bull. Pol. Acad. Sci. Math 60.2, 2012, pp. 177–185
  33. “Computational optimal transport: With applications to data science” In Foundations and Trends® in Machine Learning 11.5-6 Now Publishers, Inc., 2019, pp. 355–607
  34. Rémi Peyre “Comparison between W2subscript𝑊2W_{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distance and H˙−1superscript˙𝐻1\dot{H}^{-1}over˙ start_ARG italic_H end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT norm, and localization of Wasserstein distance” In ESAIM: Control, Optimisation and Calculus of Variations 24.4 EDP Sciences, 2018, pp. 1489–1501
  35. Daniel A Roberts, Sho Yaida and Boris Hanin “The principles of deep learning theory” Cambridge University Press Cambridge, MA, USA, 2022
  36. Filippo Santambrogio “Optimal transport for applied mathematicians” In Birkäuser, NY 55.58-63 Springer, 2015, pp. 94
  37. Cédric Villani “Optimal transport: old and new” Springer, 2009
  38. CK Williams and Carl Edward Rasmussen “Gaussian processes for machine learning, vol. 2, no. 3” Cambridge, MA, USA: MIT Press, 2006
Citations (3)

Summary

We haven't generated a summary for this paper yet.