Spectral complexity of deep neural networks (2405.09541v4)
Abstract: It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.
- M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1965.
- Random Fields and Geometry. Springer, 2007.
- N. Aronszajn. Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68(3):337–404, 1950.
- J.-M. Azais and M. Wschebor. Level Sets and Extrema of Random Processes and Fields. Wiley, 2009.
- Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation. SIAM Journal on Mathematics of Data Science, 5(2):422–445, 2023.
- Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks. Advances in Neural Information Processing Systems (NeurIPS), 11, 1998.
- Spectrally-normalized margin bounds for neural networks. Advances in Neural Information Processing Systems (NeurIPS), 30, 2017.
- Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks. Journal of Machine Learning Research, 20(63):1–17, 2019.
- A. Basteri and D. Trevisan. Quantitative Gaussian Approximation of Randomly Initialized Deep Neural Networks. arXiv:2203.07379, 2023.
- M. Bianchini and F. Scarselli. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures. IEEE Transactions on Neural Networks and Learning Systems, 25(8):1553–1565, 2014.
- A. Bietti and F. Bach. Deep Equals Shallow for ReLU Networks in Kernel Regimes. International Conference on Learning Representations (ICLR), 2021.
- A Representer Theorem for Deep Kernel Learning. Journal of Machine Learning Research, 20(64):1–32, 2019.
- A quantitative functional central limit theorem for shallow neural networks. Modern Stochastics: Theory and Applications, 11(1):85–108, 2023.
- Y. Cho and L. Saul. Kernel Methods for Deep Learning. Advances in Neural Information Processing Systems (NeurIPS), 22, 2009.
- Y. Cho and L. K. Saul. Analysis and Extension of Arc-Cosine Kernels for Large Margin Classification. arXiv:1112.3712, 2011.
- G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, 1989.
- A. Daniely. Depth separation for neural networks. Proceedings of the 2017 Conference on Learning Theory (ICML), PMLR 65:690–696, 2017.
- Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity. Advances in Neural Information Processing Systems (NeurIPS), 29, 2016.
- Nonlinear Approximation and (Deep) ReLU Networks. Constructive Approximation, 55(1):127–172, 2022.
- Gaussian Process Behaviour in Wide Deep Neural Networks. International Conference on Learning Representations (ICLR), 2018.
- S. Di Lillo. PhD Thesis. in preparation.
- R. Eldan and O. Shamir. The power of depth for feedforward neural networks. 29th Annual Conference on Learning Theory (COLT), PMLR 49:907–940, 2016.
- Quantitative CLTs in Deep Neural Networks. arXiv:2307.06092, 2023.
- Deep Sparse Rectifier Neural Networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 15:315–323, 2011.
- HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere. The Astrophysical Journal, 622(2):759, 2005.
- On the number of regions of piecewise linear neural networks. Journal of Computational and Applied Mathematics, 441:115667, 2024.
- B. Hanin. Random neural networks in the infinite width limit as Gaussian processes. The Annals of Applied Probability, 33(6A):4798 – 4819, 2023.
- B. Hanin and D. Rolnick. Deep ReLU Networks Have Surprisingly Few Activation Patterns. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019a.
- B. Hanin and D. Rolnick. Complexity of linear regions in deep networks. Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR 97:2596–2604, 2019b.
- K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2):251–257, 1991.
- Hierarchical Kernels in Deep Kernel Learning. Journal of Machine Learning Research, 24(391):1–30, 2023.
- Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
- A. Klukowski. Rate of Convergence of Polynomial Networks to Gaussian Processes. 35th Annual Conference on Learning Theory (COLT), PMLR 178:1–22, 2022.
- A. Lang and C. Schwab. Isotropic Gaussian random fields on the sphere: Regularity, fast simulation and stochastic partial differential equations. The Annals of Applied Probability, 25(6):3047–3094, 2015.
- Deep Neural Networks as Gaussian Processes. International Conference on Learning Representations (ICLR), 2018.
- Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6):861–867, 1993.
- D. Marinucci and G. Peccati. Random Fields on the Sphere: Representation, Limit Theorems and Cosmological Applications. Cambridge University Press, 2011.
- D. Marinucci and M. Rossi. Stein-Malliavin approximations for nonlinear functionals of random eigenfunctions on 𝕊dsuperscript𝕊𝑑\mathbb{S}^{d}blackboard_S start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Journal of Functional Analysis, 268(8):2379–2420, 2015.
- On the Number of Linear Regions of Deep Neural Networks. Advances in Neural Information Processing Systems (NeurIPS), 27, 2014.
- R. M. Neal. Bayesian Learning for Neural Networks. Springer New York, 1996.
- Depth Separation in Norm-Bounded Infinite-Width Neural Networks. arXiv:2402.08808, 2024.
- A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143–195, 1999.
- Gaussian Processes for Machine Learning. The MIT Press, 2005.
- I. Safran and O. Shamir. Depth-width tradeoffs in approximating natural functions with neural networks. Proceedings of the 34th International Conference on Machine Learning (ICML), PMLR 70:2979–2987, 2017.
- On the Depth of Deep Neural Networks: A Theoretical View. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 1(30):2066–2072, 2016.
- G. Szegő. Orthogonal Polynomials. American Mathematical Society, 1975.
- M. Telgarsky. Benefits of depth in neural networks. 29th Annual Conference on Learning Theory (COLT), PMLR 49:1517–1539, 2016.
- Depth separation beyond radial functions. Journal of Machine Learning Research, 23(122):1–56, 2022.
- C. Williams. Computing with Infinite Networks. Advances in Neural Information Processing Systems (NeurIPS), 9, 1996.
- Deep Kernel Learning. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 51:370–378, 2016.