Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spectral complexity of deep neural networks (2405.09541v4)

Published 15 May 2024 in stat.ML, cs.LG, and math.PR

Abstract: It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1965.
  2. Random Fields and Geometry. Springer, 2007.
  3. N. Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68(3):337–404, 1950.
  4. J.-M. Azais and M. Wschebor. Level Sets and Extrema of Random Processes and Fields. Wiley, 2009.
  5. Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation. SIAM Journal on Mathematics of Data Science, 5(2):422–445, 2023.
  6. Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks. Advances in Neural Information Processing Systems (NeurIPS), 11, 1998.
  7. Spectrally-normalized margin bounds for neural networks. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
  8. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks. Journal of Machine Learning Research, 20(63):1–17, 2019.
  9. A. Basteri and D. Trevisan. Quantitative Gaussian Approximation of Randomly Initialized Deep Neural Networks. arXiv:2203.07379, 2023.
  10. M. Bianchini and F. Scarselli. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures. IEEE Transactions on Neural Networks and Learning Systems, 25(8):1553–1565, 2014.
  11. A. Bietti and F. Bach. Deep Equals Shallow for ReLU Networks in Kernel Regimes. International Conference on Learning Representations (ICLR), 2021.
  12. A Representer Theorem for Deep Kernel Learning. Journal of Machine Learning Research, 20(64):1–32, 2019.
  13. A quantitative functional central limit theorem for shallow neural networks. Modern Stochastics: Theory and Applications, 11(1):85–108, 2023.
  14. Y. Cho and L. Saul. Kernel Methods for Deep Learning. Advances in Neural Information Processing Systems (NeurIPS), 22, 2009.
  15. Y. Cho and L. K. Saul. Analysis and Extension of Arc-Cosine Kernels for Large Margin Classification. arXiv:1112.3712, 2011.
  16. G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, 1989.
  17. A. Daniely. Depth separation for neural networks. Proceedings of the 2017 Conference on Learning Theory (ICML), PMLR 65:690–696, 2017.
  18. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity. Advances in Neural Information Processing Systems (NeurIPS), 29, 2016.
  19. Nonlinear Approximation and (Deep) ReLU Networks. Constructive Approximation, 55(1):127–172, 2022.
  20. Gaussian Process Behaviour in Wide Deep Neural Networks. International Conference on Learning Representations (ICLR), 2018.
  21. S. Di Lillo. PhD Thesis. in preparation.
  22. R. Eldan and O. Shamir. The power of depth for feedforward neural networks. 29th Annual Conference on Learning Theory (COLT), PMLR 49:907–940, 2016.
  23. Quantitative CLTs in Deep Neural Networks. arXiv:2307.06092, 2023.
  24. Deep Sparse Rectifier Neural Networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 15:315–323, 2011.
  25. HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere. The Astrophysical Journal, 622(2):759, 2005.
  26. On the number of regions of piecewise linear neural networks. Journal of Computational and Applied Mathematics, 441:115667, 2024.
  27. B. Hanin. Random neural networks in the infinite width limit as Gaussian processes. The Annals of Applied Probability, 33(6A):4798 – 4819, 2023.
  28. B. Hanin and D. Rolnick. Deep ReLU Networks Have Surprisingly Few Activation Patterns. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019a.
  29. B. Hanin and D. Rolnick. Complexity of linear regions in deep networks. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR 97:2596–2604, 2019b.
  30. K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991.
  31. Hierarchical Kernels in Deep Kernel Learning. Journal of Machine Learning Research, 24(391):1–30, 2023.
  32. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
  33. A. Klukowski. Rate of Convergence of Polynomial Networks to Gaussian Processes. 35th Annual Conference on Learning Theory (COLT), PMLR 178:1–22, 2022.
  34. A. Lang and C. Schwab. Isotropic Gaussian random fields on the sphere: Regularity, fast simulation and stochastic partial differential equations. The Annals of Applied Probability, 25(6):3047–3094, 2015.
  35. Deep Neural Networks as Gaussian Processes. International Conference on Learning Representations (ICLR), 2018.
  36. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6):861–867, 1993.
  37. D. Marinucci and G. Peccati. Random Fields on the Sphere: Representation, Limit Theorems and Cosmological Applications. Cambridge University Press, 2011.
  38. D. Marinucci and M. Rossi. Stein-Malliavin approximations for nonlinear functionals of random eigenfunctions on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Journal of Functional Analysis, 268(8):2379–2420, 2015.
  39. On the Number of Linear Regions of Deep Neural Networks. Advances in Neural Information Processing Systems (NeurIPS), 27, 2014.
  40. R. M. Neal. Bayesian Learning for Neural Networks. Springer New York, 1996.
  41. Depth Separation in Norm-Bounded Infinite-Width Neural Networks. arXiv:2402.08808, 2024.
  42. A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143–195, 1999.
  43. Gaussian Processes for Machine Learning. The MIT Press, 2005.
  44. I. Safran and O. Shamir. Depth-width tradeoffs in approximating natural functions with neural networks. Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR 70:2979–2987, 2017.
  45. On the Depth of Deep Neural Networks: A Theoretical View. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 1(30):2066–2072, 2016.
  46. G. Szegő. Orthogonal Polynomials. American Mathematical Society, 1975.
  47. M. Telgarsky. Benefits of depth in neural networks. 29th Annual Conference on Learning Theory (COLT), PMLR 49:1517–1539, 2016.
  48. Depth separation beyond radial functions. Journal of Machine Learning Research, 23(122):1–56, 2022.
  49. C. Williams. Computing with Infinite Networks. Advances in Neural Information Processing Systems (NeurIPS), 9, 1996.
  50. Deep Kernel Learning. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 51:370–378, 2016.

Summary

  • The paper presents a novel spectral framework classifying deep networks into low-disorder, sparse, and high-disorder regimes based on asymptotic spectral moments.
  • It utilizes the angular power spectrum of Gaussian processes to reveal exponential decay in low-disorder, bounded behavior in sparse, and exponential growth in high-disorder regimes.
  • These findings provide theoretical tools for analyzing network complexity and practical insights for designing more robust and efficient deep learning architectures.

Spectral Complexity of Deep Neural Networks

The research paper "Spectral Complexity of Deep Neural Networks" by Di Lillo, Marinucci, Salvi, and Vigogna investigates the complexity of neural network architectures through the lens of spectral analysis of their associated Gaussian processes. In their work, the authors utilize the angular power spectrum of isotropic random fields that emerge from fully connected neural networks as their layers' width tends to infinity. Within this framework, the research presents a novel classification of neural networks into three distinct regimes: low-disorder, sparse, and high-disorder, each characterized by unique asymptotic properties.

Summary of Results

The paper's central premise revolves around the weak convergence of neural networks to Gaussian processes when layers become infinitely wide, which allows for the representation of neural networks' functional properties via their power spectrum on the sphere. The authors define their classification based on the asymptotic behavior of moments of spectral sequences determined by the depth of neural networks:

  1. Low-Disorder Regime: When the derivative of the initial layer's kernel function at one is less than one, networks exhibit a low-disorder behavior. The moments of the spectral law decay exponentially with depth, implying that these networks converge towards trivial constant functions.
  2. Sparse Regime: This regime, characterized by unity as the kernel's first derivative, includes commonly used activations such as ReLU. The moments are found to be bounded, converging in measure but diverging beyond the second moment. This behavior suggests a self-regularization capacity, indicating the prevalence of sparsity in ReLU networks with depth.
  3. High-Disorder Regime: If the derivative exceeds one, the moments of the angular spectra grow exponentially, reflecting increasing complexity. These networks, represented by activations such as the hyperbolic tangent, increasingly capture high-frequency components with depth, leading to more chaotic outputs.

Implications

The implications of these findings are broad in both theoretical and practical aspects. From a theoretical standpoint, the paper provides a robust methodological tool for analyzing neural networks' inherent complexity in terms of their spectral behavior. The classification into the three regimes raises essential questions about the architectural choices in deep learning, highlighting the unique stability and robustness properties associated with different activation functions.

Practically, this approach can influence future developments in neural network design. The insights into sparsity, particularly in ReLU networks, suggest reconsidering layers' depth as a resource that may not always translate directly to increased functional complexity. This could impact model efficiency, leading to neural networks that are inherently more stable and less prone to overfitting without sacrificing performance.

Future Directions

The paper opens multiple avenues for further investigation. Exploring the geometrical properties of the defined random fields could provide a deeper understanding of neural networks' robustness beyond simple functional analysis. Moreover, generalizing these results beyond fully connected structures to convolutional or recurrent architectures could elucidate their behavior in more specific tasks and scenarios.

Collaboration across mathematical fields, leveraging the paper's intersection with random field theory, could further refine or extend these findings into more generalized theoretical frameworks applicable across different neural architectures and activation functions.

Overall, this paper extends the foundational understanding of neural network behavior through spectral analysis, posing critical questions that may redefine how researchers view depth and complexity in machine learning architecture design.