Wide Deep Neural Networks with Gaussian Weights are Very Close to Gaussian Processes (2312.11737v1)
Abstract: We establish novel rates for the Gaussian approximation of random deep neural networks with Gaussian parameters (weights and biases) and Lipschitz activation functions, in the wide limit. Our bounds apply for the joint output of a network evaluated any finite input set, provided a certain non-degeneracy condition of the infinite-width covariances holds. We demonstrate that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive heuristic suggested by the central limit theorem. We also apply our bounds to obtain theoretical approximations for the exact Bayesian posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set. This includes popular cases such as the Gaussian likelihood, i.e. exponential of minus the mean squared error.
- Luigi Ambrosio, Nicola Gigli and Giuseppe Savaré “Gradient flows: in metric spaces and in the space of probability measures” Springer Science & Business Media, 2005
- “Normal approximation of random gaussian neural networks” In arXiv preprint arXiv:2307.04486, 2023
- Luigi Ambrosio, Federico Stra and Dario Trevisan “A PDE approach to a 2-dimensional matching problem” In Probability Theory and Related Fields 173 Springer, 2019, pp. 433–477
- “Gaussian random field approximation via Stein’s method with applications to wide random neural networks” In arXiv preprint arXiv:2306.16308, 2023
- Thomas Bonis “Stein’s method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem” In Probab. Theory Relat. Fields 178.3, 2020, pp. 827–860 DOI: 10.1007/s00440-020-00989-4
- “Large-width functional asymptotics for deep Gaussian neural networks”, 2020 URL: https://openreview.net/forum?id=0aW6lYOYB7d
- “On the Wasserstein distance between classical sequences and the Lebesgue measure” In Transactions of the American Mathematical Society 373.12, 2020, pp. 8943–8962
- “Quantitative Gaussian approximation of randomly initialized deep neural networks” In arXiv preprint arXiv:2203.07379, 2022
- “A quantitative functional central limit theorem for shallow neural networks” In Modern Stochastics: Theory and Applications VTeX: Solutions for Science Publishing, 2023, pp. 1–24
- “A review on neural networks with random weights” In Neurocomputing 275 Elsevier, 2018, pp. 278–287
- Mark K. Cowan “battlesnake/neural” original-date: 2013-08-12T23:46:47Z, 2022 URL: https://github.com/battlesnake/neural
- “Kernel methods for deep learning” In Advances in neural information processing systems 22, 2009
- Andrea Carbonaro, Luca Tamanini and Dario Trevisan “Boundedness of Riesz transforms on RCD(K,∞)RCD𝐾\operatorname{RCD}(K,\infty)roman_RCD ( italic_K , ∞ ) spaces” In arXiv preprint arXiv:2308.16294, 2023
- Ronen Eldan, Dan Mikulincer and Tselil Schramm “Non-asymptotic approximations of neural networks by Gaussian processes” In Conference on Learning Theory, 2021, pp. 1754–1775 PMLR
- “Quantitative clts in deep neural networks” In arXiv preprint arXiv:2307.06092, 2023
- Ian Goodfellow, Yoshua Bengio and Aaron Courville “Deep Learning” http://www.deeplearningbook.org MIT Press, 2016
- “Generative Adversarial Nets” In Advances in Neural Information Processing Systems 27 Curran Associates, Inc., 2014 URL: https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html
- “Convergence of asymptotic costs for random Euclidean matching problems” In Probability and Mathematical Physics 2.2 Mathematical Sciences Publishers, 2021, pp. 341–362
- J Leo Hemmen and Tsuneya Ando “An inequality for trace ideals” In Communications in Mathematical Physics 76.2 Springer, 1980, pp. 143–148
- Boris Hanin “Random neural networks in the infinite width limit as Gaussian processes” In The Annals of Applied Probability 33.6A Institute of Mathematical Statistics, 2023, pp. 4798–4819
- “Exact posterior distributions of wide Bayesian neural networks”, 2020 URL: https://openreview.net/forum?id=s279nFs8NNT
- Adam Klukowski “Rate of convergence of polynomial networks to Gaussian processes” In Conference on Learning Theory, 2022, pp. 701–722 PMLR
- Yann LeCun, Yoshua Bengio and Geoffrey Hinton “Deep learning” Number: 7553 Publisher: Nature Publishing Group In Nature 521.7553, 2015, pp. 436–444 DOI: 10.1038/nature14539
- Michel Ledoux “On Optimal Matching of Gaussian Samples.” In Journal of Mathematical Sciences 238.4, 2019
- “Deep Neural Networks as Gaussian Processes”, 2018 URL: https://openreview.net/forum?id=B1EA-M-0Z
- “Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019 URL: https://proceedings.neurips.cc/paper/2019/hash/0d1a9651497a38d8b1c3871c84528bd4-Abstract.html
- “Gaussian Process Behaviour in Wide Deep Neural Networks”, 2018 URL: https://openreview.net/forum?id=H1-nGgWC-
- Song Mei, Andrea Montanari and Phan-Minh Nguyen “A mean field view of the landscape of two-layer neural networks” In Proceedings of the National Academy of Sciences 115.33, 2018, pp. E7665–E7671 DOI: 10.1073/pnas.1806579115
- Meenal V Narkhede, Prashant P Bartakke and Mukul S Sutaone “A review on weight initialization strategies for neural networks” In Artificial intelligence review 55.1 Springer, 2022, pp. 291–322
- Radford M. Neal “Priors for Infinite Networks” In Bayesian Learning for Neural Networks, Lecture Notes in Statistics New York, NY: Springer, 1996, pp. 29–53 DOI: 10.1007/978-1-4612-0745-0˙2
- Phan-Minh Nguyen and Huy Tuan Pham “A rigorous framework for the mean field limit of multilayer neural networks” In Mathematical Statistics and Learning 6.3, 2023, pp. 201–357
- ADAM Osekowski “A note on Burkholder-Rosenthal inequality” In Bull. Pol. Acad. Sci. Math 60.2, 2012, pp. 177–185
- “Computational optimal transport: With applications to data science” In Foundations and Trends® in Machine Learning 11.5-6 Now Publishers, Inc., 2019, pp. 355–607
- Rémi Peyre “Comparison between W2subscript𝑊2W_{2}italic_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distance and H˙−1superscript˙𝐻1\dot{H}^{-1}over˙ start_ARG italic_H end_ARG start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT norm, and localization of Wasserstein distance” In ESAIM: Control, Optimisation and Calculus of Variations 24.4 EDP Sciences, 2018, pp. 1489–1501
- Daniel A Roberts, Sho Yaida and Boris Hanin “The principles of deep learning theory” Cambridge University Press Cambridge, MA, USA, 2022
- Filippo Santambrogio “Optimal transport for applied mathematicians” In Birkäuser, NY 55.58-63 Springer, 2015, pp. 94
- Cédric Villani “Optimal transport: old and new” Springer, 2009
- CK Williams and Carl Edward Rasmussen “Gaussian processes for machine learning, vol. 2, no. 3” Cambridge, MA, USA: MIT Press, 2006