Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Variance Reduction for Distributed Stochastic Gradient Descent (1512.01708v2)

Published 5 Dec 2015 in cs.LG, cs.DC, math.OC, and stat.ML

Abstract: Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require either high memory usage or an exact gradient computation (using the entire dataset) at the end of each epoch. This limits the use of VR methods in practical distributed settings. In this paper, we propose a variance reduction method, called VR-lite, that does not require full gradient computations or extra storage. We explore distributed synchronous and asynchronous variants that are scalable and remain stable with low communication frequency. We empirically compare both the sequential and distributed algorithms to state-of-the-art stochastic optimization methods, and find that our proposed algorithms perform favorably to other stochastic methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Soham De (39 papers)
  2. Gavin Taylor (20 papers)
  3. Tom Goldstein (226 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.