Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity (1507.07595v2)

Published 27 Jul 2015 in math.OC, cs.LG, and stat.ML

Abstract: We study distributed optimization algorithms for minimizing the average of convex functions. The applications include empirical risk minimization problems in statistical machine learning where the datasets are large and have to be stored on different machines. We design a distributed stochastic variance reduced gradient algorithm that, under certain conditions on the condition number, simultaneously achieves the optimal parallel runtime, amount of communication and rounds of communication among all distributed first-order methods up to constant factors. Our method and its accelerated extension also outperform existing distributed algorithms in terms of the rounds of communication as long as the condition number is not too large compared to the size of data in each machine. We also prove a lower bound for the number of rounds of communication for a broad class of distributed first-order methods including the proposed algorithms in this paper. We show that our accelerated distributed stochastic variance reduced gradient algorithm achieves this lower bound so that it uses the fewest rounds of communication among all distributed first-order algorithms.

Authors (4)

Jason D. Lee (151 papers)
Qihang Lin (58 papers)
Tengyu Ma (117 papers)
Tianbao Yang (162 papers)

Citations (16)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity (1507.07595v2)

Summary

Related Papers