Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Random ReLU Neural Networks as Non-Gaussian Processes (2405.10229v2)

Published 16 May 2024 in stat.ML, cs.LG, and math.PR

Abstract: We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent 3/2. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Understanding neural networks with reproducing kernel Banach spaces. Applied and Computational Harmonic Analysis, 62:194–236, 2023.
  2. Generalized random fields and Lévy’s continuity theorem on the space of tempered distributions. Communications on Stochastic Analysis, 12(4):4, 2018.
  3. An Introduction to the Theory of Point Processes: Volume II: General Theory and Structure. Probability and Its Applications. Springer New York, 2007.
  4. RKHS approach to detection and estimation problems–IV: Non-Gaussian detection. IEEE Transactions on Information Theory, 19(1):19–28, 1973.
  5. Asymptotics of wide networks from Feynman diagrams. In International Conference on Learning Representations, 2020.
  6. Scaling limits of solutions of linear stochastic differential equations driven by Lévy white noises. Journal of Theoretical Probability, 32(3):1166–1189, 2019.
  7. On the continuity of characteristic functionals and sparse stochastic modeling. Journal of Fourier Analysis and Applications, 20:1179–1211, 2014.
  8. Gaussian and sparse processes are limits of generalized Poisson processes. Applied and Computational Harmonic Analysis, 48(3):1045–1065, 2020.
  9. Xavier Fernique. Processus linéaires, processus généralisés. Annales de l’institut Fourier, 17(1):1–92, 1967.
  10. Deep convolutional networks as shallow Gaussian processes. In International Conference on Learning Representations, 2019.
  11. Izrail M. Gelfand. Generalized random processes. Dokl. Akad. Nauk SSSR (N.S.), 100:853–856, 1955.
  12. Generalized functions. Vol. I: Properties and operations. Academic Press, 1964.
  13. Generalized functions, Vol. 4: Applications of harmonic analysis. Academic Press, 1964.
  14. Generalized functions. Vol. 5: Integral geometry and representation theory. Academic Press, 1966.
  15. Boris Hanin. Random neural networks in the infinite width limit as Gaussian processes. The Annals of Applied Probability, 33(6A):4798–4819, 2023.
  16. Sigurdur Helgason. Integral Geometry and Radon Transforms. Springer New York, 2011.
  17. Analysis on Hilbert space with reproducing kernel arising from multiple Wiener integral. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability, pages 117–143. Univ. California Press, Berkeley, CA, 1967.
  18. Kiyosi Itô. Stationary random distributions. Memoirs of the College of Science. University of Kyoto. Series A. Mathematics, 28:209–223, 1954.
  19. Kiyosi Itô. Foundations of stochastic differential equations in infinite dimensional spaces, volume 47. SIAM, 1984.
  20. Lévy-Type Processes and Pseudodifferential Operators, pages 139–168. Birkhäuser Boston, Boston, MA, 2001. ISBN 978-1-4612-0197-7.
  21. Andrei N. Kolmogorov. La transformation de Laplace dans les espaces linéaires. CR Acad. Sci. Paris, 200:1717–1718, 1935.
  22. Deep neural networks as Gaussian processes. In International Conference on Learning Representations, 2018.
  23. Donald Ludwig. The Radon transform on Euclidean space. Communications on Pure and Applied Mathematics, 19:49–81, 1966.
  24. Fractional Brownian motions, fractional noises and applications. SIAM Review, 10(4):422–437, 1968.
  25. Gaussian process behaviour in wide deep neural networks. In International Conference on Learning Representations, 2018.
  26. Robert A. Minlos. Generalized random processes and their extension in measure. Trudy Moskovskogo Matematicheskogo Obshchestva, 8:497–518, 1959.
  27. Radford M. Neal. Bayesian Learning for Neural Networks. Lecture Notes in Statistics. Springer New York, 1996.
  28. Bayesian deep convolutional networks with many channels are Gaussian processes. In International Conference on Learning Representations, 2019.
  29. A function space view of bounded norm infinite width ReLU nets: The multivariate case. In International Conference on Learning Representations, 2020.
  30. Banach space representer theorems for neural networks and ridge splines. Journal of Machine Learning Research, 22(43):1–40, 2021.
  31. What kinds of functions do deep neural networks learn? Insights from variational spline theory. SIAM Journal on Mathematics of Data Science, 4(2):464–489, 2022.
  32. Near-minimax optimal estimation with shallow ReLU neural networks. IEEE Transactions on Information Theory, 69(2):1125–1140, 2023a.
  33. Deep learning meets sparse regularization: A signal processing perspective. IEEE Signal Processing Magazine, 40(6):63–74, 2023b.
  34. The Radon transform and local tomography. CRC Press, Boca Raton, FL, 1996.
  35. Walter Rudin. Functional analysis. International Series in Pure and Applied Mathematics. McGraw-Hill, Inc., New York, second edition, 1991.
  36. Ken-Iti Sato. Lévy Processes and Infinitely Divisible Distributions. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1999.
  37. Michael Unser. Ridges, neural networks, and the Radon transform. Journal of Machine Learning Research, 24(37):1–33, 2023.
  38. A unified formulation of Gaussian versus sparse stochastic processes—Part I: Continuous-domain theory. IEEE Transactions on Information Theory, 60(3):1945–1962, 2014.
  39. Christopher Williams. Computing with infinite networks. Advances in Neural Information Processing Systems, 9, 1996.
  40. Sho Yaida. Non-Gaussian processes and neural networks at finite widths. In Mathematical and Scientific Machine Learning, pages 165–192. PMLR, 2020.
  41. Greg Yang. Tensor programs I: Wide feedforward or recurrent neural networks of any architecture are Gaussian processes. Advances in Neural Information Processing Systems, 32, 2019.

Summary

  • The paper shows that shallow ReLU neural networks with random initialization act as well-defined non-Gaussian processes with a closed-form autocovariance function.
  • It employs stochastic differential equations and Poisson modeling to reveal the networks' isotropy and self-similar scaling characteristics.
  • The findings challenge traditional Gaussian assumptions, providing new insights for improved modeling, Bayesian inference, and real-world AI applications.

Random ReLU Neural Networks as Non-Gaussian Processes

Introduction

This paper explores the intriguing behavior of shallow neural networks with randomly initialized parameters and ReLU (Rectified Linear Unit) activation functions. Unlike usual Gaussian processes, these random ReLU neural networks emerge as well-defined non-Gaussian processes. The research dives into their properties, offers a new perspective by avoiding the typical asymptotic (infinite-width) viewpoint, and presents both theoretical and practical implications for AI research and development.

Key Findings

ReLU Networks and Well-Defined Processes

The researchers prove that shallow ReLU neural networks with random parameters are well-defined non-Gaussian processes. These networks, referred to as random ReLU neural networks, exhibit unique properties influenced by the distribution of their weights and biases and the density of activation thresholds in bounded regions of the input domain. Here are some high points:

  • Isotropic: These processes look the same in all directions, which implies they have rotational symmetry.
  • Wide-Sense Self-Similar: They scale in a self-consistent manner with a Hurst exponent of 3/2.
  • Closed-Form Autocovariance: A simple closed-form expression for the autocovariance function.

Technical Details

The processes are characterized by stochastic differential equations driven by impulsive white noise. The number of neurons in each bounded region is a random variable following a Poisson distribution with a mean proportional to the density parameter.

Asymptotic Behaviors

One of the significant contributions is showing that in the infinite-width regime, these networks can approach both Gaussian and non-Gaussian processes, depending on the distribution of weights. Specifically:

  • For Gaussian-distributed weights, the processes converge to classic Gaussian processes.
  • For weights following a symmetric alpha-stable law, the processes remain non-Gaussian even as width increases.

Implications

This insight means random neural networks' behaviors depend heavily on their initialization, breaking the traditional assumption that infinite-width networks are inherently Gaussian. In practical terms:

  • Modeling: Understanding the exact behavior helps in better modeling using neural networks, particularly in probabilistic and generative models.
  • Bayesian Inference: Although these methods traditionally assume Gaussianity, recognizing non-Gaussian behaviors opens new doors for more precise inference techniques.

Future Considerations

The findings pave the way for further exploration:

  1. Broader Functions and Architectures: Extending these insights to more complex network structures and different activation functions could uncover richer behaviors.
  2. Real-World Applications: Testing these theoretical results in real-world applications, particularly in reinforcement learning and Bayesian optimization, might significantly impact practical AI designs.
  3. Advanced Statistical Techniques: Developing new methods to better deal with non-Gaussian data arising from these neural network models.

Strong Numerical and Bold Results

The strong numerical results include a remarkably simple closed-form for the autocovariance function and the demonstration, through rigorous proofs, that these networks do not necessarily converge to Gaussian processes in wide limits—a bold deviation from traditional assumptions.

In summary, this research presents a new lens to view the behavior of shallow, random ReLU neural networks, emphasizing their non-Gaussian nature under specific conditions. The implications for both theory and practice in AI are substantial, suggesting further avenues for exploration and innovation.