- The paper demonstrates that both deep neural networks and biological brains exhibit avalanche dynamics governed by quasi-criticality and power-law distributions.
- It employs crackling noise theory and mean-field approximations to derive scaling relations that diagnose optimal trainability and inform network design.
- Empirical findings reveal that maximal susceptibility, rather than the edge of chaos, better predicts learning performance in various architectures.
Event-Resolved Criticality in Deep Learning and Biological Neural Networks
Introduction
This paper establishes a rigorous connection between non-equilibrium statistical physics and the dynamics of deep neural networks (DNNs) and biological brains. By leveraging crackling noise theory and the mathematics of phase transitions, the authors demonstrate that both systems exhibit avalanche-like cascades of activity, with deep learning performance optimized in a quasi-critical regime rather than at a true critical point. The work provides a unified framework for understanding learning and information propagation in artificial and biological neural systems, identifies universality classes governing their dynamics, and offers practical diagnostics for model initialization and architecture design.
Theoretical Framework: Crackling Noise and Criticality
The paper draws on the concept of neuronal avalanches—spatiotemporal bursts of activity separated by silent periods—originally observed in living brains. These avalanches are characterized by power-law distributions of size and duration, scaling relations, and universal shape collapse, all hallmarks of systems near a critical phase transition. The authors adapt these tools to deep neural networks, showing that the equations governing avalanche statistics in brains are equally applicable to cascades of activity in DNNs.
The analysis is grounded in non-equilibrium statistical physics, specifically crackling noise theory, which provides a set of scaling relations and exponent constraints for avalanche statistics. The critical regime is defined by the balance between absorbing and active phases, with the system poised at the edge of chaos. However, due to strong input drive, both brains and DNNs operate in a quasi-critical regime, where susceptibility peaks along a Widom-like line rather than diverging at a true critical point.
Mean-Field Theory and Dynamical Phase Transitions
The authors employ a mean-field approximation to analyze Gaussian-initialized feed-forward networks. In the large-width limit, the pre-activations become Gaussian random variables, and the evolution of their variance is governed by recursive equations. The steady-state signal strength and its susceptibility to weight variance fluctuations are derived, revealing a continuous phase transition with exponents matching the mean-field directed percolation universality class.
Two critical boundaries are identified:
- Edge of Chaos: Defined by diverging cross-input correlation depth, traditionally associated with optimal information propagation.
- Widom-like Line: Characterized by maximal susceptibility to connectivity fluctuations, empirically shown to better predict learning performance.
The intersection of these boundaries occurs only at the exact critical point, which is destroyed by nonzero bias variance or strong input drive, leading to the quasi-critical plateau.
Avalanche Statistics in Deep Networks
The paper introduces an event-resolved approach to characterize avalanches in DNNs. Avalanches are defined by tracking the signal strength across layers, with the input strength serving as the threshold. The duration is the number of layers above threshold, and the size is the cumulative signal strength. The authors collect millions of avalanches across networks of varying width and depth, fitting power-law distributions to size and duration and extracting scaling exponents.
Key findings include:
- Power-law distributions of avalanche size and duration spanning multiple decades.
- Scaling relation y≈Td​−(Ts​−1) between exponents, consistent with crackling noise theory.
- Universal shape collapse of avalanche profiles, indicating self-similar propagation.
- Exponents for Gaussian networks align with Barkhausen noise universality class; ResNet architectures exhibit mean-field directed percolation exponents.
Empirical analysis on MNIST classification tasks reveals that learning performance peaks in regions of maximal susceptibility, not necessarily at the edge of chaos. Networks initialized near the critical value of weight variance (σw2​≈1) are trainable, with the width of the trainable region increasing with network size. As bias variance increases, the trainable region narrows and shifts, and learning performance declines, mirroring the flattening of the susceptibility ridge.
The paper demonstrates that maximal susceptibility is a more reliable predictor of learning than proximity to the critical point itself. This resolves the longstanding puzzle of poor trainability at the edge of chaos in the presence of non-negligible bias, and provides a blueprint for engineering improved network performance via initialization and regularization strategies.
Universality Classes and Architecture Dependence
Beyond Gaussian networks, the authors analyze avalanche statistics in highly engineered architectures such as ResNets. Despite architectural heterogeneity, ResNets exhibit robust crackling noise scaling, with exponents clustering near the mean-field directed percolation class. The results suggest that universality classes are robust to architectural details, but the strength of power-law behavior and scaling relations may vary, indicating potential avenues for architectural optimization.
Practical Implications and Diagnostics
The event-resolved crackling noise framework offers operational diagnostics for model initialization and training:
- Exponent-relation mismatch and shape collapse quality can be tracked to assess proximity to criticality.
- Susceptibility-based phase diagrams can guide hyperparameter selection and architecture design.
- Regularization techniques (dropout, spectral constraints, batch normalization, injected noise, residual depth) can be tuned to steer models toward the quasi-critical plateau.
- Universality class steering may enable the design of architectures with task-specific performance characteristics.
Limitations and Future Directions
The paper acknowledges limitations due to finite system sizes, subsampling, and focus on pre-activations and specific architectures. Broader tests across architectures, observables, and training regimes are needed to generalize the quasi-criticality framework. The potential for steering universality classes to optimize task performance remains an open question.
Conclusion
This work provides a rigorous, physics-based framework for understanding deep learning and brain dynamics, demonstrating that both systems operate in a quasi-critical regime characterized by avalanche statistics and maximal susceptibility. The identification of universality classes and the development of practical diagnostics offer new tools for model design and analysis. The shared physics between artificial and biological neural networks suggests that future advances in AI may be guided by principles derived from statistical physics and neuroscience, with quasi-criticality serving as a central organizing concept.