Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD (1910.09466v3)

Published 21 Oct 2019 in cs.LG and stat.ML

Abstract: Large scale machine learning is increasingly relying on distributed optimization, whereby several machines contribute to the training process of a statistical model. In this work we study the performance of asynchronous, distributed settings, when applying sparsification, a technique used to reduce communication overheads. In particular, for the first time in an asynchronous, non-convex setting, we theoretically prove that, in presence of staleness, sparsification does not harm SGD performance: the ergodic convergence rate matches the known result of standard SGD, that is $\mathcal{O} \left( 1/\sqrt{T} \right)$. We also carry out an empirical study to complement our theory, and confirm that the effects of sparsification on the convergence rate are negligible, when compared to 'vanilla' SGD, even in the challenging scenario of an asynchronous, distributed system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Rosa Candela (3 papers)
  2. Giulio Franzese (18 papers)
  3. Maurizio Filippone (58 papers)
  4. Pietro Michiardi (58 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.