Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Parallel Restarted SPIDER -- Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity (1912.06036v2)

Published 12 Dec 2019 in math.OC, cs.DC, cs.LG, cs.MA, and stat.ML

Abstract: In this paper, we propose a distributed algorithm for stochastic smooth, non-convex optimization. We assume a worker-server architecture where $N$ nodes, each having $n$ (potentially infinite) number of samples, collaborate with the help of a central server to perform the optimization task. The global objective is to minimize the average of local cost functions available at individual nodes. The proposed approach is a non-trivial extension of the popular parallel-restarted SGD algorithm, incorporating the optimal variance-reduction based SPIDER gradient estimator into it. We prove convergence of our algorithm to a first-order stationary solution. The proposed approach achieves the best known communication complexity $O(\epsilon{-1})$ along with the optimal computation complexity. For finite-sum problems (finite $n$), we achieve the optimal computation (IFO) complexity $O(\sqrt{Nn}\epsilon{-1})$. For online problems ($n$ unknown or infinite), we achieve the optimal IFO complexity $O(\epsilon{-3/2})$. In both the cases, we maintain the linear speedup achieved by existing methods. This is a massive improvement over the $O(\epsilon{-2})$ IFO complexity of the existing approaches. Additionally, our algorithm is general enough to allow non-identical distributions of data across workers, as in the recently proposed federated learning paradigm.

Citations (17)

Summary

We haven't generated a summary for this paper yet.