Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

New Convergence Aspects of Stochastic Gradient Algorithms (1811.12403v2)

Published 10 Nov 2018 in math.OC and cs.LG

Abstract: The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective function is strongly convex. In Bottou et al. (2018), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. We show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime. We then move on to the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime in the case of diminished learning rate. It is well-known that SGD converges if a sequence of learning rates ${\eta_t}$ satisfies $\sum_{t=0}\infty \eta_t \rightarrow \infty$ and $\sum_{t=0}\infty \eta2_t < \infty$. We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when ${\eta_t}$ is a diminishing sequence and $\sum_{t=0}\infty \eta_t \rightarrow \infty$. In other words, we extend the current state-of-the-art class of learning rates satisfying the convergence of SGD.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lam M. Nguyen (58 papers)
  2. Phuong Ha Nguyen (20 papers)
  3. Katya Scheinberg (40 papers)
  4. Martin Takáč (145 papers)
  5. Marten van Dijk (36 papers)
  6. Peter Richtárik (241 papers)
Citations (62)

Summary

We haven't generated a summary for this paper yet.