Spectral complexity of deep neural networks (2405.09541v4)

Published 15 May 2024 in stat.ML, cs.LG, and math.PR

Abstract: It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.

References (50)

Summary

The paper presents a novel spectral framework classifying deep networks into low-disorder, sparse, and high-disorder regimes based on asymptotic spectral moments.
It utilizes the angular power spectrum of Gaussian processes to reveal exponential decay in low-disorder, bounded behavior in sparse, and exponential growth in high-disorder regimes.
These findings provide theoretical tools for analyzing network complexity and practical insights for designing more robust and efficient deep learning architectures.

Spectral Complexity of Deep Neural Networks

The research paper "Spectral Complexity of Deep Neural Networks" by Di Lillo, Marinucci, Salvi, and Vigogna investigates the complexity of neural network architectures through the lens of spectral analysis of their associated Gaussian processes. In their work, the authors utilize the angular power spectrum of isotropic random fields that emerge from fully connected neural networks as their layers' width tends to infinity. Within this framework, the research presents a novel classification of neural networks into three distinct regimes: low-disorder, sparse, and high-disorder, each characterized by unique asymptotic properties.

Summary of Results

The paper's central premise revolves around the weak convergence of neural networks to Gaussian processes when layers become infinitely wide, which allows for the representation of neural networks' functional properties via their power spectrum on the sphere. The authors define their classification based on the asymptotic behavior of moments of spectral sequences determined by the depth of neural networks:

Low-Disorder Regime: When the derivative of the initial layer's kernel function at one is less than one, networks exhibit a low-disorder behavior. The moments of the spectral law decay exponentially with depth, implying that these networks converge towards trivial constant functions.
Sparse Regime: This regime, characterized by unity as the kernel's first derivative, includes commonly used activations such as ReLU. The moments are found to be bounded, converging in measure but diverging beyond the second moment. This behavior suggests a self-regularization capacity, indicating the prevalence of sparsity in ReLU networks with depth.
High-Disorder Regime: If the derivative exceeds one, the moments of the angular spectra grow exponentially, reflecting increasing complexity. These networks, represented by activations such as the hyperbolic tangent, increasingly capture high-frequency components with depth, leading to more chaotic outputs.

Implications

The implications of these findings are broad in both theoretical and practical aspects. From a theoretical standpoint, the paper provides a robust methodological tool for analyzing neural networks' inherent complexity in terms of their spectral behavior. The classification into the three regimes raises essential questions about the architectural choices in deep learning, highlighting the unique stability and robustness properties associated with different activation functions.

Practically, this approach can influence future developments in neural network design. The insights into sparsity, particularly in ReLU networks, suggest reconsidering layers' depth as a resource that may not always translate directly to increased functional complexity. This could impact model efficiency, leading to neural networks that are inherently more stable and less prone to overfitting without sacrificing performance.

Future Directions

The paper opens multiple avenues for further investigation. Exploring the geometrical properties of the defined random fields could provide a deeper understanding of neural networks' robustness beyond simple functional analysis. Moreover, generalizing these results beyond fully connected structures to convolutional or recurrent architectures could elucidate their behavior in more specific tasks and scenarios.

Collaboration across mathematical fields, leveraging the paper's intersection with random field theory, could further refine or extend these findings into more generalized theoretical frameworks applicable across different neural architectures and activation functions.

Overall, this paper extends the foundational understanding of neural network behavior through spectral analysis, posing critical questions that may redefine how researchers view depth and complexity in machine learning architecture design.

PDF Markdown

Related Papers

Tweets

https://twitter.com/StatMLPapers/status/1790956388631335029

https://twitter.com/MirceaSci/status/1791154851063169376

https://twitter.com/arxivsanitybot/status/1791651130289668333