Efficient attainment of ρ-SOSPs by SGD with random initialization
Determine whether stochastic gradient descent (SGD) with random initialization can efficiently attain a ρ-second-order stationary point when optimizing the regularized expected risk of a neural network of the form f(W, b; θ) = E[gθ(Wx + b)] + λ||W||_F^2 under the smoothness and Hessian Lipschitz assumptions with Gaussian inputs, as considered in the derandomization framework for structure discovery.
References
It is an open question whether this can be done efficiently, but empirical results on NNs strongly support this behavior.
— A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
(2510.19382 - Tsikouras et al., 22 Oct 2025) in Section 1 (Introduction), footnote after “SGD with random initialization”