Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD (1803.01113v3)

Published 3 Mar 2018 in stat.ML and cs.LG

Abstract: Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect convergence. In this work we present a novel theoretical characterization of the speed-up offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time). The novelty in our work is that our runtime analysis considers random straggler delays, which helps us design and compare distributed SGD algorithms that strike a balance between stragglers and staleness. We also present a new convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions, and a novel learning rate schedule to compensate for gradient staleness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sanghamitra Dutta (34 papers)
  2. Gauri Joshi (73 papers)
  3. Soumyadip Ghosh (17 papers)
  4. Parijat Dube (19 papers)
  5. Priya Nagpurkar (2 papers)
Citations (186)