Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes (2010.14763v2)

Published 27 Oct 2020 in cs.LG, math.OC, and stat.ML

Abstract: Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Marten van Dijk (36 papers)
  2. Nhuong V. Nguyen (4 papers)
  3. Toan N. Nguyen (5 papers)
  4. Lam M. Nguyen (58 papers)
  5. Quoc Tran-Dinh (65 papers)
  6. Phuong Ha Nguyen (20 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.