Random ReLU Neural Networks as Non-Gaussian Processes (2405.10229v2)
Abstract: We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent 3/2. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).
- Understanding neural networks with reproducing kernel Banach spaces. Applied and Computational Harmonic Analysis, 62:194–236, 2023.
- Generalized random fields and Lévy’s continuity theorem on the space of tempered distributions. Communications on Stochastic Analysis, 12(4):4, 2018.
- An Introduction to the Theory of Point Processes: Volume II: General Theory and Structure. Probability and Its Applications. Springer New York, 2007.
- RKHS approach to detection and estimation problems–IV: Non-Gaussian detection. IEEE Transactions on Information Theory, 19(1):19–28, 1973.
- Asymptotics of wide networks from Feynman diagrams. In International Conference on Learning Representations, 2020.
- Scaling limits of solutions of linear stochastic differential equations driven by Lévy white noises. Journal of Theoretical Probability, 32(3):1166–1189, 2019.
- On the continuity of characteristic functionals and sparse stochastic modeling. Journal of Fourier Analysis and Applications, 20:1179–1211, 2014.
- Gaussian and sparse processes are limits of generalized Poisson processes. Applied and Computational Harmonic Analysis, 48(3):1045–1065, 2020.
- Xavier Fernique. Processus linéaires, processus généralisés. Annales de l’institut Fourier, 17(1):1–92, 1967.
- Deep convolutional networks as shallow Gaussian processes. In International Conference on Learning Representations, 2019.
- Izrail M. Gelfand. Generalized random processes. Dokl. Akad. Nauk SSSR (N.S.), 100:853–856, 1955.
- Generalized functions. Vol. I: Properties and operations. Academic Press, 1964.
- Generalized functions, Vol. 4: Applications of harmonic analysis. Academic Press, 1964.
- Generalized functions. Vol. 5: Integral geometry and representation theory. Academic Press, 1966.
- Boris Hanin. Random neural networks in the infinite width limit as Gaussian processes. The Annals of Applied Probability, 33(6A):4798–4819, 2023.
- Sigurdur Helgason. Integral Geometry and Radon Transforms. Springer New York, 2011.
- Analysis on Hilbert space with reproducing kernel arising from multiple Wiener integral. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability, pages 117–143. Univ. California Press, Berkeley, CA, 1967.
- Kiyosi Itô. Stationary random distributions. Memoirs of the College of Science. University of Kyoto. Series A. Mathematics, 28:209–223, 1954.
- Kiyosi Itô. Foundations of stochastic differential equations in infinite dimensional spaces, volume 47. SIAM, 1984.
- Lévy-Type Processes and Pseudodifferential Operators, pages 139–168. Birkhäuser Boston, Boston, MA, 2001. ISBN 978-1-4612-0197-7.
- Andrei N. Kolmogorov. La transformation de Laplace dans les espaces linéaires. CR Acad. Sci. Paris, 200:1717–1718, 1935.
- Deep neural networks as Gaussian processes. In International Conference on Learning Representations, 2018.
- Donald Ludwig. The Radon transform on Euclidean space. Communications on Pure and Applied Mathematics, 19:49–81, 1966.
- Fractional Brownian motions, fractional noises and applications. SIAM Review, 10(4):422–437, 1968.
- Gaussian process behaviour in wide deep neural networks. In International Conference on Learning Representations, 2018.
- Robert A. Minlos. Generalized random processes and their extension in measure. Trudy Moskovskogo Matematicheskogo Obshchestva, 8:497–518, 1959.
- Radford M. Neal. Bayesian Learning for Neural Networks. Lecture Notes in Statistics. Springer New York, 1996.
- Bayesian deep convolutional networks with many channels are Gaussian processes. In International Conference on Learning Representations, 2019.
- A function space view of bounded norm infinite width ReLU nets: The multivariate case. In International Conference on Learning Representations, 2020.
- Banach space representer theorems for neural networks and ridge splines. Journal of Machine Learning Research, 22(43):1–40, 2021.
- What kinds of functions do deep neural networks learn? Insights from variational spline theory. SIAM Journal on Mathematics of Data Science, 4(2):464–489, 2022.
- Near-minimax optimal estimation with shallow ReLU neural networks. IEEE Transactions on Information Theory, 69(2):1125–1140, 2023a.
- Deep learning meets sparse regularization: A signal processing perspective. IEEE Signal Processing Magazine, 40(6):63–74, 2023b.
- The Radon transform and local tomography. CRC Press, Boca Raton, FL, 1996.
- Walter Rudin. Functional analysis. International Series in Pure and Applied Mathematics. McGraw-Hill, Inc., New York, second edition, 1991.
- Ken-Iti Sato. Lévy Processes and Infinitely Divisible Distributions. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1999.
- Michael Unser. Ridges, neural networks, and the Radon transform. Journal of Machine Learning Research, 24(37):1–33, 2023.
- A unified formulation of Gaussian versus sparse stochastic processes—Part I: Continuous-domain theory. IEEE Transactions on Information Theory, 60(3):1945–1962, 2014.
- Christopher Williams. Computing with infinite networks. Advances in Neural Information Processing Systems, 9, 1996.
- Sho Yaida. Non-Gaussian processes and neural networks at finite widths. In Mathematical and Scientific Machine Learning, pages 165–192. PMLR, 2020.
- Greg Yang. Tensor programs I: Wide feedforward or recurrent neural networks of any architecture are Gaussian processes. Advances in Neural Information Processing Systems, 32, 2019.