Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Universal Scaling Laws of Absorbing Phase Transitions in Artificial Deep Neural Networks (2307.02284v2)

Published 5 Jul 2023 in stat.ML, cond-mat.dis-nn, cond-mat.stat-mech, and cs.LG

Abstract: We demonstrate that conventional artificial deep neural networks operating near the phase boundary of the signal propagation dynamics, also known as the edge of chaos, exhibit universal scaling laws of absorbing phase transitions in non-equilibrium statistical mechanics. Our numerical results indicate that the multilayer perceptrons and the convolutional neural networks belong to the mean-field and the directed percolation universality classes, respectively. Also, the finite-size scaling is successfully applied, suggesting a potential connection to the depth-width trade-off in deep learning. Furthermore, our analysis of the training dynamics under gradient descent reveals that hyperparameter tuning to the phase boundary is necessary but insufficient for achieving optimal generalization in deep networks. Remarkably, nonuniversal metric factors associated with the scaling laws are shown to play a significant role in concretizing the above observations. These findings highlight the usefulness of the notion of criticality for analyzing the behavior of artificial deep neural networks and offer new insights toward a unified understanding of an essential relationship between criticality and intelligence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, Jan 2016.
  2. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022.
  3. Felix Stahlberg. Neural Machine Translation: A Review. Journal of Artificial Intelligence Research, 69:343–418, 2020.
  4. Scaling laws for autoregressive generative modeling. CoRR, abs/2010.14701, 2020.
  5. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020.
  6. Scaling for edge inference of deep neural networks. Nature Electronics, 1(4):216–222, Apr 2018.
  7. Non-Equilibrium Phase Transitions. Springer, 1st edition, 2008.
  8. Haye Hinrichsen. Non-equilibrium critical phenomena and phase transitions into absorbing states. Advances in Physics, 49(7):815–958, 2000. arXiv: https://arxiv.org/abs/cond-mat/0001070.
  9. Self-organized criticality as an absorbing-state phase transition. Phys. Rev. E, 57:5095–5105, May 1998.
  10. Mauricio Girardi-Schappo. Brain criticality beyond avalanches: open problems and how to approach them. Journal of Physics: Complexity, 2(3):031003, sep 2021.
  11. Chaos in random neural networks. Phys. Rev. Lett., 61:259–262, Jul 1988.
  12. Exponential expressivity in deep neural networks through transient chaos. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
  13. Deep Information Propagation. In International Conference on Learning Representations, 2017.
  14. The Principles of Deep Learning Theory. Cambridge University Press, 2022. https://deeplearningtheory.com, arXiv: https://arxiv.org/abs/2106.10165.
  15. Self-organized criticality as a fundamental property of neural systems. Frontiers in Systems Neuroscience, 8, 2014.
  16. On the Impact of the Activation function on Deep Neural Networks Training. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2672–2680. PMLR, 09–15 Jun 2019.
  17. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 5393–5402. PMLR, 2018.
  18. Disentangling trainability and generalization in deep neural networks. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 10462–10472. PMLR, 13–18 Jul 2020.
  19. Onset of chaotic dynamics in neural networks. Phys. Rev. E, 88:042908, Oct 2013.
  20. Directed percolation criticality in turbulent liquid crystals. Phys. Rev. Lett., 99:234503, Dec 2007.
  21. Experimental realization of directed percolation criticality in turbulent liquid crystals. Phys. Rev. E, 80:051116, Nov 2009.
  22. T. E. Harris. Contact interactions on a lattice. The Annals of Probability, 2(6):969–988, 1974.
  23. Kevin T. Grosvenor and Ro Jefferson. The edge of chaos: quantum field theory and deep neural networks. SciPost Phys., 12:081, 2022.
  24. Sho Yaida. Non-Gaussian processes and neural networks at finite widths. In Jianfeng Lu and Rachel Ward, editors, Proceedings of The First Mathematical and Scientific Machine Learning Conference, volume 107 of Proceedings of Machine Learning Research, pages 165–192. PMLR, 20–24 Jul 2020.
  25. John E. Kolassa. Edgeworth Series, pages 31–62. Springer New York, New York, NY, 2006.
  26. Percolation processes: I. crystals and mazes. Mathematical Proceedings of the Cambridge Philosophical Society, 53(3):629–641, 1957.
  27. Iwan Jensen. Low-density series expansions for directed percolation: I. a new efficient algorithm with applications to the square lattice. Journal of Physics A: Mathematical and General, 32(28):5233, jul 1999.
  28. H. K. Janssen. On the nonequilibrium phase transition in reaction-diffusion systems with an absorbing stationary state. Zeitschrift für Physik B Condensed Matter, 42(2):151–154, Jun 1981.
  29. P. Grassberger. On phase transitions in schlögl’s second model. Zeitschrift für Physik B Condensed Matter, 47(4):365–374, Dec 1982.
  30. Epidemic analysis of the second-order transition in the ziff-gulari-barshad surface-reaction model. Phys. Rev. E, 56:R6241–R6244, Dec 1997.
  31. High-precision monte carlo study of directed percolation in (d+1𝑑1d+1italic_d + 1) dimensions. Phys. Rev. E, 88:042102, Oct 2013.
  32. Crossover from directed percolation to mean field behavior in the diffusive contact process. Journal of Statistical Mechanics: Theory and Experiment, 2008(04):P04024, 2008.
  33. Neural tangent kernel: Convergence and generalization in neural networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  34. On Lazy Training in Differentiable Programming. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  35. Finite Depth and Width Corrections to the Neural Tangent Kernel. In International Conference on Learning Representations, 2020.
  36. Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 19522–19560. PMLR, 17–23 Jul 2022.
  37. A.J. Guttmann. Indicators of solvability for lattice models. Discrete Mathematics, 217(1):167–189, 2000.
  38. Dynamical isometry and a mean field theory of RNNs: Gating enables signal propagation in recurrent neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 873–882. PMLR, 10–15 Jul 2018.
  39. Ge Yang and Samuel Schoenholz. Mean field residual networks: On the edge of chaos. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  40. Space representation of stochastic processes with delay. Phys. Rev. E, 77:031106, Mar 2008.
  41. Evidence of a critical phase transition in purely temporal dynamics with long-delayed feedback. Phys. Rev. Lett., 120:173901, Apr 2018.
  42. Deep Neural Networks as Gaussian Processes. In International Conference on Learning Representations, 2018.
  43. Gaussian Process Behaviour in Wide Deep Neural Networks. In International Conference on Learning Representations, 2018.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com