Long-run selection among critical points under SGD
Determine, for stochastic gradient descent with constant step-size applied to a smooth non-convex objective function f: R^d -> R, which critical points or connected components of the critical set of f are more likely to be observed in the long run by the algorithm, and quantify the relative likelihoods of visits (i.e., the asymptotic distribution over components).
References
In particular, the following crucial question remains open: Which critical points of f {or components thereof} are more likely to be observed in the long run – and by how much?
— What is the long-run distribution of stochastic gradient descent? A large deviations analysis
(2406.09241 - Azizian et al., 13 Jun 2024) in Section 1 (Introduction)