Random ReLU Neural Networks as Non-Gaussian Processes (2405.10229v2)

Published 16 May 2024 in stat.ML, cs.LG, and math.PR

Abstract: We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent 3/2. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).

References (41)

Summary

The paper shows that shallow ReLU neural networks with random initialization act as well-defined non-Gaussian processes with a closed-form autocovariance function.
It employs stochastic differential equations and Poisson modeling to reveal the networks' isotropy and self-similar scaling characteristics.
The findings challenge traditional Gaussian assumptions, providing new insights for improved modeling, Bayesian inference, and real-world AI applications.

Random ReLU Neural Networks as Non-Gaussian Processes

Introduction

This paper explores the intriguing behavior of shallow neural networks with randomly initialized parameters and ReLU (Rectified Linear Unit) activation functions. Unlike usual Gaussian processes, these random ReLU neural networks emerge as well-defined non-Gaussian processes. The research dives into their properties, offers a new perspective by avoiding the typical asymptotic (infinite-width) viewpoint, and presents both theoretical and practical implications for AI research and development.

Key Findings

ReLU Networks and Well-Defined Processes

The researchers prove that shallow ReLU neural networks with random parameters are well-defined non-Gaussian processes. These networks, referred to as random ReLU neural networks, exhibit unique properties influenced by the distribution of their weights and biases and the density of activation thresholds in bounded regions of the input domain. Here are some high points:

Isotropic: These processes look the same in all directions, which implies they have rotational symmetry.
Wide-Sense Self-Similar: They scale in a self-consistent manner with a Hurst exponent of 3/2.
Closed-Form Autocovariance: A simple closed-form expression for the autocovariance function.

Technical Details

The processes are characterized by stochastic differential equations driven by impulsive white noise. The number of neurons in each bounded region is a random variable following a Poisson distribution with a mean proportional to the density parameter.

Asymptotic Behaviors

One of the significant contributions is showing that in the infinite-width regime, these networks can approach both Gaussian and non-Gaussian processes, depending on the distribution of weights. Specifically:

For Gaussian-distributed weights, the processes converge to classic Gaussian processes.
For weights following a symmetric alpha-stable law, the processes remain non-Gaussian even as width increases.

Implications

This insight means random neural networks' behaviors depend heavily on their initialization, breaking the traditional assumption that infinite-width networks are inherently Gaussian. In practical terms:

Modeling: Understanding the exact behavior helps in better modeling using neural networks, particularly in probabilistic and generative models.
Bayesian Inference: Although these methods traditionally assume Gaussianity, recognizing non-Gaussian behaviors opens new doors for more precise inference techniques.

Future Considerations

The findings pave the way for further exploration:

Broader Functions and Architectures: Extending these insights to more complex network structures and different activation functions could uncover richer behaviors.
Real-World Applications: Testing these theoretical results in real-world applications, particularly in reinforcement learning and Bayesian optimization, might significantly impact practical AI designs.
Advanced Statistical Techniques: Developing new methods to better deal with non-Gaussian data arising from these neural network models.

Strong Numerical and Bold Results

The strong numerical results include a remarkably simple closed-form for the autocovariance function and the demonstration, through rigorous proofs, that these networks do not necessarily converge to Gaussian processes in wide limits—a bold deviation from traditional assumptions.

In summary, this research presents a new lens to view the behavior of shallow, random ReLU neural networks, emphasizing their non-Gaussian nature under specific conditions. The implications for both theory and practice in AI are substantial, suggesting further avenues for exploration and innovation.

Related Papers

Tweets

https://twitter.com/StatMLPapers/status/1791318774714360035

https://twitter.com/realmofresearch/status/1792739163391045839