Understanding training dynamics of deep neural networks

Establish a rigorous, general theory explaining the training dynamics of deep neural networks, characterizing how optimization processes evolve and under what conditions they converge or reach stationary behavior, in order to clarify the mechanisms that govern empirical performance and guide principled choices of training hyperparameters.

Background

The paper frames its contribution within a physics-inspired perspective on optimization, noting that despite strong empirical success, the mechanisms underlying how deep neural networks train and reach stationary behavior are not well understood. This motivates introducing a thermodynamic analogy—interpreting stochastic gradient noise as thermal fluctuations and identifying macroscopic variables (temperature, pressure, volume) linked to learning rate and weight decay.

By focusing on scale-invariant neural networks and deriving stationary distributions from stochastic differential equations, the authors provide partial progress toward this broader goal. However, they explicitly acknowledge that achieving a comprehensive understanding of training dynamics remains an open problem and use their framework to illuminate aspects of stationary behavior and hyperparameter effects.

References

Understanding the training dynamics of deep neural networks remains a major open problem, with physics-inspired approaches offering promising insights.

— Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas? (2511.07308 - Sadrtdinov et al., 10 Nov 2025) in Abstract (page 1)

Understanding training dynamics of deep neural networks

Sponsor

Background

References

Related Problems