On the Optimal Time Complexities in Decentralized Stochastic Asynchronous Optimization
Abstract: We consider the decentralized stochastic asynchronous optimization setup, where many workers asynchronously calculate stochastic gradients and asynchronously communicate with each other using edges in a multigraph. For both homogeneous and heterogeneous setups, we prove new time complexity lower bounds under the assumption that computation and communication speeds are bounded. We develop a new nearly optimal method, Fragile SGD, and a new optimal method, Amelie SGD, that converge under arbitrary heterogeneous computation and communication speeds and match our lower bounds (up to a logarithmic factor in the homogeneous setting). Our time complexities are new, nearly optimal, and provably improve all previous asynchronous/synchronous stochastic methods in the decentralized setup.
- Lower bounds for non-convex stochastic optimization. Mathematical Programming, pages 1–50.
- SWIFT: Rapid decentralized federated learning via wait-free model communication. In The 11th International Conference on Learning Representations (ICLR).
- Lower bounds for finding stationary points i. Mathematical Programming, 184(1):71–120.
- Asynchronous stochastic optimization robust to arbitrary delays. Advances in Neural Information Processing Systems, 34:9024–9035.
- Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Transactions on Automatic control, 57(3):592–606.
- Asynchronous SGD on graphs: A unified framework for asynchronous decentralized and federated optimization. In Artificial Intelligence and Statistics. PMLR.
- Floyd, R. W. (1962). Algorithm 97: shortest path. Communications of the ACM, 5(6):345–345.
- Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368.
- Lower bounds and nearly optimal algorithms in distributed learning with communication compression. Advances in Neural Information Processing Systems (NeurIPS).
- Better theory for SGD in the nonconvex world. Transactions on Machine Learning Research.
- Koloskova, A. (2024). Optimization algorithms for decentralized, distributed and collaborative machine learning. Technical report, EPFL.
- An improved analysis of gradient tracking for decentralized machine learning. Advances in Neural Information Processing Systems, 34:11422–11435.
- Sharper convergence guarantees for asynchronous SGD for distributed and federated learning. Advances in Neural Information Processing Systems (NeurIPS).
- Lan, G. (2020). First-order and stochastic optimization methods for machine learning. Springer.
- Asynchronous decentralized parallel stochastic gradient descent. In International Conference on Machine Learning, pages 3043–3052. PMLR.
- Decentralized gradient tracking with local steps. Optimization Methods and Software, pages 1–28.
- Optimal complexity in decentralized training. In International Conference on Machine Learning, pages 7111–7123. PMLR.
- Asynchronous SGD beats minibatch SGD under arbitrary delays. Advances in Neural Information Processing Systems (NeurIPS).
- Nesterov, Y. (1983). A method of solving a convex programming problem with convergence rate o (1/k** 2). Doklady Akademii Nauk SSSR, 269(3):543.
- Optimal algorithms for smooth and strongly convex distributed optimization in networks. In International Conference on Machine Learning, pages 3027–3036. PMLR.
- A proximal gradient algorithm for decentralized composite optimization. IEEE Transactions on Signal Processing, 63(22):6013–6023.
- Shadowheart SGD: Distributed asynchronous SGD with optimal time complexity under arbitrary computation and communication heterogeneity. arXiv preprint arXiv:2402.04785.
- Optimal time complexities of parallel stochastic optimization methods under a fixed computation model. Advances in Neural Information Processing Systems (NeurIPS).
- Van Handel, R. (2014). Probability in high dimension. Lecture Notes (Princeton University).
- RelaySum for decentralized deep learning on heterogeneous data. Advances in Neural Information Processing Systems, 34.
- A survey of distributed optimization. Annual Reviews in Control, 47:278–305.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.