Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Sharp Convergence Rate for the Asynchronous Stochastic Gradient Descent (2001.09126v1)

Published 24 Jan 2020 in math.NA, cs.NA, and math.OC

Abstract: We give a sharp convergence rate for the asynchronous stochastic gradient descent (ASGD) algorithms when the loss function is a perturbed quadratic function based on the stochastic modified equations introduced in [An et al. Stochastic modified equations for the asynchronous stochastic gradient descent, arXiv:1805.08244]. We prove that when the number of local workers is larger than the expected staleness, then ASGD is more efficient than stochastic gradient descent. Our theoretical result also suggests that longer delays result in slower convergence rate. Besides, the learning rate cannot be smaller than a threshold inversely proportional to the expected staleness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yuhua Zhu (26 papers)
  2. Lexing Ying (159 papers)

Summary

We haven't generated a summary for this paper yet.