Gaussian random field approximation via Stein's method with applications to wide random neural networks (2306.16308v2)
Abstract: We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $\sup$-norm, between any continuous $\mathbb{R}d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructed using powers of Laplacian operators, designed so that the associated Gaussian process has a tractable Cameron-Martin or Reproducing Kernel Hilbert Space. This feature enables us to move beyond one dimensional interval-based index sets that were previously considered in the literature. Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level. Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. We also obtain tighter bounds when the activation function has three bounded derivatives.
- The merged-staircase property: A necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks. In Conference on Learning Theory, pages 4782–4887. PMLR, 2022.
- Random fields and geometry, volume 80. Springer, 2007.
- B. Arras and C. Houdré. On Stein’s method for multivariate self-decomposable laws. Electron. J. Probab., 2019.
- B. Arras and C. Houdré. On some operators associated with non-degenerate symmetric α𝛼\alphaitalic_α-stable probability measures. Potential Anal., pages 1–52, 2022.
- High-dimensional asymptotics of feature learning: How one gradient step improves the representation. In Advances in Neural Information Processing Systems, 2022.
- Analysis and geometry of Markov diffusion operators, volume 103. Springer, 2014.
- High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance. arXiv preprint arXiv:2304.00707, 2023.
- A. D. Barbour. Stein’s method for diffusion approximations. Probab. Theory Relat. Fields, 84(3):297–322, 1990.
- Stein’s method, smoothing and functional approximation. arXiv preprint arXiv:2106.01564, 2021.
- Stein’s method, Gaussian processes and Palm measures, with applications to queueing. Ann. Appl. Probab. (to appear), 2023. https://arxiv.org/abs/2110.10365.
- A. Basteri and D. Trevisan. Quantitative Gaussian approximation of randomly initialized deep neural networks. arXiv preprint arXiv:2203.07379, 2022.
- Adaptive algorithms and stochastic approximations, volume 22. Springer Science & Business Media, 2012.
- Infinitely wide limits for deep stable neural networks: sub-linear, linear and super-linear activation functions. Transactions on Machine Learning Research, 2023a.
- Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities. arXiv preprint arXiv:2304.04010, 2023b.
- S. Bourguin and S. Campese. Approximation of Hilbert-valued Gaussians on Dirichlet structures. Electron. J. Probab., 25:30, 2020.
- Spectral stability of metric-measure Laplacians. Isr. J. Math., 232(1):125–158, 2019.
- S. Chatterjee. Fluctuations of eigenvalues and second order Poincaré inequalities. Probab. Theory Relat. Fields, 143(1-2):1–40, 2009.
- Normal approximation by Stein’s method. Springer, 2011.
- Multivariate stable approximation by Stein’s Method. J. Theor. Probab., pages 1–43, 2023.
- A dynamical central limit theorem for shallow neural networks. In Advances in Neural Information Processing Systems, volume 33, 2020.
- L. Coutin and L. Decreusefond. Stein’s method for Brownian approximations. Communications on Stochastic Analysis, 7(3):1, 2013.
- L. Coutin and L. Decreusefond. Stein’s method for rough paths. Potential Anal., 53(2):387–406, 2020.
- F. Dai and Y. Xu. Approximation theory and harmonic analysis on spheres and balls, volume 23. Springer, 2013.
- Neural networks can learn representations with gradient descent. In Conference on Learning Theory, pages 5413–5452. PMLR, 2022.
- Gaussian process behaviour in wide deep neural networks. In International Conference on Learning Representations, 2018.
- R. Der and D. Lee. Beyond Gaussian processes: On the distributions of infinite networks. In Advances in Neural Information Processing Systems, volume 18, 2005.
- C. Dobler and M. J. Kasprzak. Stein’s method of exchangeable pairs in multivariate functional approximations. Electron. J. Probab., 26:1–50, 2021.
- R. M. Dudley. Real analysis and probability. CRC Press, 2018.
- Non-asymptotic approximations of neural networks by Gaussian processes. In Conference on Learning Theory, pages 1754–1775. PMLR, 2021.
- Deep stable neural networks: Large-width asymptotics and convergence rates. Bernoulli, 29(3):2574–2597, 2023.
- X. Fernique. Intégrabilité des vecteurs Gaussiens. CR Acad. Sci. Paris Serie A, 270:1698–1699, 1970.
- Bayesian neural network priors revisited. In International Conference on Learning Representations, 2022.
- H. L. Gan and N. Ross. Stein’s method for the Poisson-Dirichlet distribution and the Ewens Sampling Formula, with applications to Wright-Fisher models. Ann. Appl. Probab., 31(2):625 – 667, 2021.
- E. Golikov and G. Yang. Non-Gaussian tensor programs. In Advances in Neural Information Processing Systems, volume 35, 2022.
- B. Hanin. Random neural networks in the infinite width limit as Gaussian processes. Ann. Appl. Probab (to appear), 2023. https://arxiv.org/abs/2107.01562.
- α𝛼\alphaitalic_α-stable convergence of heavy-tailed infinitely-wide neural networks. arXiv preprint arXiv:2106.11064, 2021.
- S. Kakutani. On equivalence of infinite product measures. Ann. Math., pages 214–224, 1948.
- M. J. Kasprzak. Stein’s method for multivariate Brownian approximations of sums under dependence. Stochastic Processes Appl., 130(8):4927–4967, 2020a.
- M. J. Kasprzak. Functional approximations via Stein’s method of exchangeable pairs. Ann. Inst. Henri Poincaré Probab. Stat., 56(4):2540–2564, 2020b.
- Note on A. Barbour’s paper on Stein’s method for diffusion approximations. Electron. Commun. Probab., 22, 2017.
- A. Klukowski. Rate of convergence of polynomial networks to Gaussian processes. In Conference on Learning Theory, pages 701–722. PMLR, 2022.
- Deep neural networks with dependent weights: Gaussian process mixture limit, heavy tails, sparsity and compressibility. arXiv preprint arXiv:2205.08187, 2022.
- Deep neural networks as Gaussian processes. In International Conference on Learning Representations, 2018.
- The neural covariance SDE: Shaped infinite depth-and-width networks at initialization. In Advances in Neural Information Processing Systems, volume 35, 2022.
- R. M. Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 1996.
- I. Nourdin and G. Peccati. Normal approximations with Malliavin calculus: From Stein’s method to universality, volume 192. Cambridge University Press, 2012.
- Sharp estimates of the spherical heat kernel. J. Math. Pures Appl., 129:23–33, 2019.
- D. Pollard. Convergence of stochastic processes. Springer Science & Business Media, 1984.
- M. Raič. A multivariate central limit theorem for Lipschitz and smooth test functions. arXiv preprint arXiv:1812.08268, 2018.
- N. Ross. Fundamentals of Stein’s method. Probab. Surv., 8:210–293, 2011.
- G. Rotskoff and E. Vanden-Eijnden. Trainability and accuracy of artificial neural networks: An interacting particle system approach. Commun. Pure Appl. Math., 75(9):1889–1935, 2022.
- H.-H. Shih. On Stein’s method for infinite-dimensional Gaussian approximation in abstract Wiener spaces. J. Funct. Anal., 261(5):1236–1283, 2011.
- J. Sirignano and K. Spiliopoulos. Mean field analysis of neural networks: A central limit theorem. Stochastic Processes Appl., 130(3):1820–1852, 2020.
- K.-T. Sturm. Diffusion processes and heat kernels on metric spaces. Ann. Probab., 26(1):1–55, 1998.
- R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- A. Vidotto. An improved second-order Poincaré inequality for functionals of Gaussian fields. J. Theor. Probab., 33(1):396–427, 2020.
- L. Xu. Approximation of stable law in Wasserstein-1 distance by Stein’s method. Ann. Appl. Probab, 29(1):458–504, 2019.
- G. Yang. Wide feedforward or recurrent neural networks of any architecture are Gaussian processes. In Advances in Neural Information Processing Systems, volume 32, 2019.