Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks (2202.00293v4)

Published 1 Feb 2022 in stat.ML, cond-mat.dis-nn, and cs.LG

Abstract: Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.

References (30)

Citations (30)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - rodsveiga/phdiag_sgd: Repository of the "Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural network" (4 stars)

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks (2202.00293v4)

Summary

Related Papers

GitHub