Convergence theory for non-convex deep network training

Establish a rigorous theoretical framework describing the convergence behavior of gradient-based optimization methods (including stochastic gradient descent) when minimizing the non-convex empirical risk E(θ) associated with deep neural networks, specifying conditions under which convergence is guaranteed and characterizing the nature of limiting points.

Background

The paper contrasts the well-understood convex case (where fθ depends linearly on θ) with the deep neural network setting, where the empirical risk E(θ) is non-convex. While convex cases admit comprehensive convergence theories, deep networks pose significant analytical challenges due to non-convexity.

The authors explicitly note that the theoretical analysis of deep network optimization remains largely unresolved, motivating the need for a rigorous convergence theory tailored to the non-convex landscape encountered in practice.

References

However, in the case of deep neural networks, $E(\theta)$ is non-convex, making its theoretical analysis challenging and still largely unresolved.

— The Mathematics of Artificial Intelligence (2501.10465 - Peyré, 15 Jan 2025) in Section “Supervised Learning”, Empirical Risk Minimization paragraph

Convergence theory for non-convex deep network training

Background

References

Related Problems