Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Commutative Width and Depth Scaling in Deep Neural Networks (2310.01683v1)

Published 2 Oct 2023 in stat.ML and cs.LG

Abstract: This paper is the second in the series Commutative Scaling of Width and Depth (WD) about commutativity of infinite width and depth limits in deep neural networks. Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i.e. the neural function tends to the same limit no matter how width and depth limits are taken. In this paper, we formally introduce and define the commutativity framework, and discuss its implications on neural network design and scaling. We study commutativity for the neural covariance kernel which reflects how network layers separate data. Our findings extend previous results established in [55] by showing that taking the width and depth to infinity in a deep neural network with skip connections, when branches are suitably scaled to avoid exploding behaviour, result in the same covariance structure no matter how that limit is taken. This has a number of theoretical and practical implications that we discuss in the paper. The proof techniques in this paper are novel and rely on tools that are more accessible to readers who are not familiar with stochastic calculus (used in the proofs of WD(I))).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. E. Fehlberg “Classical Fifth-, Sixth-, Seventh-, and Eighth-Order Runge-Kutta Formulas with Stepsize Control” In NASA Technical Report, 1968
  2. R.M. Neal “Bayesian Learning for Neural Networks” Springer Science & Business Media, 1995
  3. John Butcher “Numerical Methods for Ordinary Differential Equations”, 2003
  4. “Kernel Methods for Deep Learning” In Advances in Neural Information Processing Systems, 2009
  5. “Exponential expressivity in deep neural networks through transient chaos” In 30th Conference on Neural Information Processing Systems, 2016
  6. “Deep Information Propagation” In International Conference on Learning Representations, 2017
  7. “Mean field residual networks: On the edge of chaos” In Advances in neural information processing systems, 2017, pp. 7103–7114
  8. “Out-of-equilibrium dynamical mean-field equations for the perceptron model” In Journal of Physics A: Mathematical and Theoretical 51.8 IOP Publishing, 2018, pp. 085002
  9. “Deep Neural Networks as Gaussian Processes” In International Conference on Learning Representations, 2018
  10. “Gaussian Process Behaviour in Wide Deep Neural Networks” In International Conference on Learning Representations, 2018
  11. Dyego Araújo, Roberto I Oliveira and Daniel Yukimura “A mean-field limit for certain deep neural networks” In arXiv preprint arXiv:1906.00193, 2019
  12. “Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks” In Proceedings of the 36th International Conference on Machine Learning 97, Proceedings of Machine Learning Research PMLR, 2019, pp. 322–332
  13. Boris Hanin “Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations” In Mathematics 7.10, 2019
  14. “Products of Many Large Random Matrices and Gradients in Deep Neural Networks” In Communications in Mathematical Physics 376.1 Springer ScienceBusiness Media LLC, 2019, pp. 287–322
  15. S. Hayou, A. Doucet and J. Rousseau “On the Impact of the Activation Function on Deep Neural Networks Training” In International Conference on Machine Learning, 2019
  16. Soufiane Hayou, Arnaud Doucet and Judith Rousseau “Training dynamics of deep networks using stochastic gradient descent via neural tangent kernel” arXiv, 2019
  17. Song Mei, Theodor Misiakiewicz and Andrea Montanari “Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit” In Conference on Learning Theory, 2019, pp. 2388–2464 PMLR
  18. “Understanding Priors in Bayesian Neural Networks at the Unit Level” In Proceedings of the 36th International Conference on Machine Learning 97, Proceedings of Machine Learning Research PMLR, 2019, pp. 6458–6467
  19. G. Yang “Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation” In arXiv preprint arXiv:1902.04760, 2019
  20. G. Yang “Tensor programs i: Wide feedforward or recurrent neural networks of any architecture are Gaussian processes” In arXiv preprint arXiv:1910.12478, 2019
  21. “A fine-grained spectral perspective on neural networks” In arXiv preprint arXiv:1907.10599, 2019
  22. “Finite Depth and Width Corrections to the Neural Tangent Kernel” In International Conference on Learning Representations, 2020
  23. S. Hayou, A. Doucet and J. Rousseau “Mean-field Behaviour of Neural Tangent Kernel for Deep Neural Networks” In arXiv preprint arXiv:1905.13654, 2020
  24. Bobby He, Balaji Lakshminarayanan and Yee Whye Teh “Bayesian Deep Ensembles via the Neural Tangent Kernel” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 1010–1022
  25. “Infinite attention: NNGP and NTK for deep attention networks” In Proceedings of the 37th International Conference on Machine Learning 119, Proceedings of Machine Learning Research PMLR, 2020, pp. 4376–4386
  26. Phan-Minh Nguyen and Huy Tuan Pham “A rigorous framework for the mean field limit of multilayer neural networks” In arXiv preprint arXiv:2001.11443, 2020
  27. “Infinitely deep neural networks as diffusion processes” In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics 108, Proceedings of Machine Learning Research PMLR, 2020, pp. 1126–1136
  28. Lechao Xiao, Jeffrey Pennington and Samuel Schoenholz “Disentangling Trainability and Generalization in Deep Neural Networks” In Proceedings of the 37th International Conference on Machine Learning 119, Proceedings of Machine Learning Research PMLR, 2020, pp. 10462–10472
  29. G. Yang “Tensor Programs III: Neural Matrix Laws” In arXiv preprint arXiv:2009.10685, 2020
  30. “Modeling from features: a mean-field framework for over-parameterized deep neural networks” In Conference on learning theory, 2021, pp. 1887–1936 PMLR
  31. “Robust Pruning at Initialization” In International Conference on Learning Representations, 2021
  32. “Regularization in ResNet with Stochastic Depth” In Proceedings of Thirty-fifth Neural Information Processing Systems (NeurIPS), 2021
  33. “Stable ResNet” In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics 130, Proceedings of Machine Learning Research PMLR, 2021, pp. 1324–1332
  34. Mufan Li, Mihai Nica and Dan Roy “The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 7852–7864
  35. “Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping” In arXiv, preprint 2110.01765, 2021
  36. “Precise characterization of the prior predictive distribution of deep ReLU networks” In Advances in Neural Information Processing Systems, 2021 URL: https://openreview.net/forum?id=DTA7Bgrai-Q
  37. “Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks” In ICML 2021, 2021
  38. “Exact marginal prior distributions of finite Bayesian neural networks” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 3364–3375 URL: https://proceedings.neurips.cc/paper_files/paper/2021/file/1baff70e2669e8376347efd3a874a341-Paper.pdf
  39. “Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks”, 2022 arXiv:2205.09653 [stat.ML]
  40. Boris Hanin “Correlation Functions in Random Fully Connected Neural Networks at Finite Width” arXiv, 2022
  41. Soufiane Hayou “On the infinite-depth limit of finite-width neural networks” In Transactions on Machine Learning Research, 2022
  42. Soufiane Hayou, Arnaud Doucet and Judith Rousseau “The Curse of Depth in Kernel Regime” In Proceedings on ”I (Still) Can’t Believe It’s Not Better!” at NeurIPS 2021 Workshops 163, Proceedings of Machine Learning Research PMLR, 2022, pp. 41–47
  43. Arthur Jacot “Theory of Deep Learning: Neural Tangent Kernel and Beyond” In PhD Thesis, Ecole Polytechnique Federale de Lausanne, 2022 URL: https://infoscience.epfl.ch/record/295831/files/EPFL_TH9825.pdf
  44. “Freeze and Chaos: NTK views on DNN Normalization, Checkerboard and Boundary Artifacts” In Proceedings of Mathematical and Scientific Machine Learning 190, Proceedings of Machine Learning Research PMLR, 2022, pp. 257–270
  45. Mufan Bill Li, Mihai Nica and Daniel M. Roy “The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization” In arXiv, 2022
  46. “Connecting Optimization and Generalization via Gradient Flow Path Length” arXiv, 2022
  47. Yizhang Lou, Chris E Mingard and Soufiane Hayou “Feature Learning and Signal Propagation in Deep Neural Networks” In Proceedings of the 39th International Conference on Machine Learning, 2022, pp. 14248–14282
  48. “Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?” In Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference 145, Proceedings of Machine Learning Research PMLR, 2022, pp. 868–895
  49. “Mean field analysis of deep neural networks” In Mathematics of Operations Research 47.1 INFORMS, 2022, pp. 120–152
  50. “Gaussian Pre-Activations in Neural Networks: Myth or Reality?” arXiv, 2022
  51. Greg Yang, Michael Santacroce and Edward J Hu “Efficient Computation of Deep Nonlinear Infinite-Width Neural Networks that Learn Features” In International Conference on Learning Representations, 2022
  52. Guodong Zhang, Aleksandar Botev and James Martens “Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers” In International Conference on Learning Representations, 2022
  53. “The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks”, 2023 arXiv:2210.02157 [stat.ML]
  54. Nicola Muca Cirone, Maud Lemercier and Cristopher Salvi “Neural signature kernels as infinite-width-depth-limits of controlled ResNets”, 2023 arXiv:2303.17671 [math.DS]
  55. “Width and Depth Limits Commute in Residual Networks” In International Conference on Machine Learning, 2023 URL: https://api.semanticscholar.org/CorpusID:256459595
  56. “Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization”, 2023 arXiv:2302.09712 [stat.ML]
  57. “The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit”, 2023
Citations (2)

Summary

We haven't generated a summary for this paper yet.