Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent (1906.02702v11)

Published 6 Jun 2019 in math.OC, cs.DC, cs.LG, and cs.MA

Abstract: This paper is concerned with minimizing the average of $n$ cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, which we show behaves as $K_T=\mathcal{O}\left(\frac{n}{(1-\rho_w)2}\right)$, where $1-\rho_w$ denotes the spectral gap of the mixing matrix. Moreover, we construct a "hard" optimization problem for which we show the transient time needed for DSGD to approach the asymptotic convergence rate is lower bounded by $\Omega \left(\frac{n}{(1-\rho_w)2} \right)$, implying the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shi Pu (109 papers)
  2. Alex Olshevsky (69 papers)
  3. Ioannis Ch. Paschalidis (66 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.