Convergence theory for non-convex deep network training
Establish a rigorous theoretical framework describing the convergence behavior of gradient-based optimization methods (including stochastic gradient descent) when minimizing the non-convex empirical risk E(θ) associated with deep neural networks, specifying conditions under which convergence is guaranteed and characterizing the nature of limiting points.
References
However, in the case of deep neural networks, $E(\theta)$ is non-convex, making its theoretical analysis challenging and still largely unresolved.
— The Mathematics of Artificial Intelligence
(2501.10465 - Peyré, 15 Jan 2025) in Section “Supervised Learning”, Empirical Risk Minimization paragraph