Scale-Robust Timely Asynchronous Decentralized Learning (2404.19749v1)
Abstract: We consider an asynchronous decentralized learning system, which consists of a network of connected devices trying to learn a machine learning model without any centralized parameter server. The users in the network have their own local training data, which is used for learning across all the nodes in the network. The learning method consists of two processes, evolving simultaneously without any necessary synchronization. The first process is the model update, where the users update their local model via a fixed number of stochastic gradient descent steps. The second process is model mixing, where the users communicate with each other via randomized gossiping to exchange their models and average them to reach consensus. In this work, we investigate the staleness criteria for such a system, which is a sufficient condition for convergence of individual user models. We show that for network scaling, i.e., when the number of user devices $n$ is very large, if the gossip capacity of individual users scales as $\Omega(\log n)$, we can guarantee the convergence of user models in finite time. Furthermore, we show that the bounded staleness can only be guaranteed by any distributed opportunistic scheme by $\Omega(n)$ scaling.
- Gossip learning with linear models on fully distributed data. Concurrency and Computation: Practice and Experience, 25(4):556–571, May 2013.
- Robust decentralized low-rank matrix decomposition. ACM Transactions on Intelligent Systems and Technology, 7(4):1–24, May 2016.
- Gossip learning as a decentralized alternative to federated learning. In Distributed Applications and Interoperable Systems, pages 74–90. Springer, June 2019.
- Decentralized learning works: An empirical comparison of gossip learning and federated learning. Journal of Parallel and Distributed Computing, 148:109–124, February 2021.
- Towards 6G hyper-connectivity: Vision, challenges, and key enabling technologies. Journal of Communications and Networks, 25(3):344–354, June 2023.
- A. Nedic and A. Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, January 2009.
- Constrained consensus and optimization in multi-agent networks. IEEE Transactions on Automatic Control, 55(4):922–938, April 2010.
- On distributed averaging algorithms and quantization effects. IEEE Transactions on Automatic Control, 54(11):2506–2517, November 2009.
- A survey of distributed optimization. Annual Reviews in Control, 47:278–305, January 2019.
- Decentralized deep learning with arbitrary communication compression. In ICLR, April 2019.
- Decentralized stochastic optimization and gossip algorithms with compressed communication. In ICML, May 2019.
- A unified theory of decentralized sgd with changing topology and local updates. In ICML, November 2020.
- Asynchronous gossip algorithms for stochastic optimization. In IEEE CDC, December 2009.
- How to scale distributed deep learning? In NeurIPS, November 2016.
- Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. NeurIPS, December 2017.
- M. Glasgow and M. Wootters. Asynchronous distributed optimization with stochastic delays. In AISTAT, May 2022.
- Asynchronous decentralized parallel stochastic gradient descent. In ICML, July 2018.
- An asynchronous distributed training algorithm based on gossip communication and stochastic gradient descent. Computer Communications, 195:416–423, November 2022.
- X. Wang and Y. Wang. Asynchronous hierarchical federated learning. 2022. Available at arXiv:2206.00054.
- Straggler-resilient decentralized learning via adaptive asynchronous updates. 2023. Available at arXiv:2306.06559.
- D. Shah. Gossip algorithms. Foundations and Trends in Networking, 3(1):1–125, 2008.
- Age of information in gossip networks: A friendly introduction and literature survey, 2023. Available at arXiv:2312.16163.
- R. D. Yates. The age of gossip in networks. In IEEE ISIT, July 2021.
- P. Mitra and S. Ulukus. ASUMAN: Age sense updating multiple access in networks. In Allerton Conference, September 2022.
- P. Mitra and S. Ulukus. Timely opportunistic gossiping in dense networks. In IEEE Infocom, May 2023.
- P. Mitra and S. Ulukus. Age-aware gossiping in network topologies. April 2023. Available at arXiv:2304.03249.
- P. Mitra and S. Ulukus. Timely asynchronous hierarchical federated learning: Age of convergence. In WiOpt conference, August 2023.