Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems (2006.11144v1)

Published 19 Jun 2020 in math.OC, cs.LG, math.PR, and stat.ML

Abstract: This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is $\mathcal{O}(1/n{p})$ if the method is employed with a $\Theta(1/np)$ step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Panayotis Mertikopoulos (90 papers)
  2. Nadav Hallak (4 papers)
  3. Ali Kavis (15 papers)
  4. Volkan Cevher (216 papers)
Citations (80)

Summary

We haven't generated a summary for this paper yet.