Explicit form of the potential Φ for stationary SGD on the unit sphere

Determine the explicit functional form of the potential Φ over unit weight directions that defines the stationary Gibbs distribution ρ(ȳ) ∝ exp(−Φ(ȳ)/T0) for stochastic gradient descent with weight decay in scale-invariant neural networks when the gradient-noise covariance Σ(ȳ) is anisotropic and spatially dependent, so that predictions beyond tests V1 and V3 can be directly verified.

Background

In the isotropic noise model, the stationary distribution over unit directions is Gibbs with energy equal to the loss L(ȳ), enabling analytic expressions and direct validation of multiple thermodynamic predictions. For general neural networks, prior work (Chaudhari and Soatto) shows that the stationary density is Gibbs with an implicit potential Φ(w) that equals the loss only under isotropic noise; Kunin et al. derive Φ explicitly in linear regression.

Extending this to scale-invariant networks, the authors define the stationary distribution on the unit sphere as ρ(ȳ) ∝ exp(−Φ(ȳ)/T0) and note that Σ(ȳ) is anisotropic and depends on position. Because Φ(ȳ) lacks an explicit form in this setting, they can directly validate only a subset of predictions (V1 and V3). Determining Φ would enable full verification and strengthen the thermodynamic analogy in realistic training regimes.

References

However, since Φ is unknown, we can only directly verify V1 and V3.

— Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas? (2511.07308 - Sadrtdinov et al., 10 Nov 2025) in Section 6.1 (Generalizing isotropic noise model)

Explicit form of the potential Φ for stationary SGD on the unit sphere

Sponsor

Background

References

Related Problems