Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays (2405.10123v2)
Abstract: Federated learning (FL) was recently proposed to securely train models with data held over multiple locations ("clients") under the coordination of a central server. Two major challenges hindering the performance of FL algorithms are long training times caused by straggling clients, and a decline in model accuracy under non-iid local data distributions ("client drift"). In this work, we propose and analyze Asynchronous Exact Averaging (AREA), a new stochastic (sub)gradient algorithm that utilizes asynchronous communication to speed up convergence and enhance scalability, and employs client memory to correct the client drift caused by variations in client update frequencies. Moreover, AREA is, to the best of our knowledge, the first method that is guaranteed to converge under arbitrarily long delays, without the use of delay-adaptive stepsizes, and (i) for strongly convex, smooth functions, asymptotically converges to an error neighborhood whose size depends only on the variance of the stochastic gradients used with respect to the number of iterations, and (ii) for convex, non-smooth functions, matches the convergence rate of the centralized stochastic subgradient method up to a constant factor, which depends on the average of the individual client update frequencies instead of their minimum (or maximum). Our numerical results validate our theoretical analysis and indicate AREA outperforms state-of-the-art methods when local data are highly non-iid, especially as the number of clients grows.
- Distributed delayed stochastic optimization. Advances in Neural Information Processing Systems, 24, 2011.
- A tight convergence analysis for stochastic gradient descent with delayed updates. In Algorithmic Learning Theory, pages 111–132. PMLR, 2020.
- Federated learning under arbitrary communication patterns. In International Conference on Machine Learning, pages 425–435. PMLR, 2021.
- Parallel and distributed computation: numerical methods. Athena Scientific, 2015.
- Ozdaglar A.E. Bertsekas D.P., Nedic A. Convex analysis and optimization, 2003.
- Practical secure aggregation for federated learning on user-held data. In NIPS Workshop on Private Multi-Party Machine Learning, 2016. URL https://arxiv.org/abs/1611.04482.
- Stochastic subgradient methods, 2008.
- FedSA: A staleness-aware asynchronous Federated Learning algorithm with non-IID data. Future Generation Computer Systems, 120:1–12, 2021. ISSN 0167739X. doi:10.1016/j.future.2021.02.012. URL https://linkinghub.elsevier.com/retrieve/pii/S0167739X21000649.
- Asynchronous online federated learning for edge devices with non-iid data. In 2020 IEEE International Conference on Big Data (Big Data), pages 15--24. IEEE, 2020.
- Large scale distributed deep networks. Advances in Neural Information Processing Systems, 25, 2012.
- Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141--142, 2012.
- Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD. In International Conference on Artificial Intelligence and Statistics, pages 803--812. PMLR, 2018.
- Asynchronous iterations in optimization: New sequence results and sharper algorithmic guarantees. J. Mach. Learn. Res., 24:158--1, 2023.
- A general theory for federated optimization with asynchronous and heterogeneous clients updates. Journal of Machine Learning Research, 24(110):1--43, 2023. URL http://jmlr.org/papers/v24/22-0689.html.
- Fast federated learning in the presence of arbitrary device unavailability. Advances in Neural Information Processing Systems, 34:12052--12064, 2021.
- Roger A. Horn. Matrix analysis, 1990 - 1985.
- Cross-silo federated learning: Challenges and opportunities. ArXiv, abs/2206.12949, 2022. URL https://api.semanticscholar.org/CorpusID:250073287.
- Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pages 5132--5143. PMLR, 2020.
- Sharper convergence guarantees for asynchronous SGD for distributed and federated learning. Advances in Neural Information Processing Systems, 35:17202--17215, 2022.
- Federated optimization:distributed optimization beyond the datacenter, 2015. URL http://arxiv.org/abs/1511.03575. arXiv:1511.03575 [cs, math].
- FAVAS: Federated averaging with asynchronous clients, May 2023. URL http://arxiv.org/abs/2305.16099. arXiv:2305.16099 [cs, stat].
- Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429--450, 2020.
- On the convergence of FedAvg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
- FedCompass: Efficient cross-silo federated learning on heterogeneous client devices using a computing power aware scheduler, September 2023. URL http://arxiv.org/abs/2309.14675. arXiv:2309.14675 [cs].
- Asynchronous parallel stochastic gradient for nonconvex optimization. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, page 2737–2745, Cambridge, MA, USA, 2015. MIT Press.
- Perturbed iterate analysis for asynchronous stochastic optimization. SIAM Journal on Optimization, 27(4):2202--2229, 2017.
- A fast distributed asynchronous Newton-based optimization algorithm. IEEE Transactions on Automatic Control, 65(7):2769--2784, 2020. doi:10.1109/TAC.2019.2933607.
- Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics, 2016. URL https://api.semanticscholar.org/CorpusID:14955348.
- Communication-efficient learning of deep networks from decentralized data, 2023. URL http://arxiv.org/abs/1602.05629. arXiv:1602.05629 [cs].
- ProxSkip: Yes! Local gradient steps provably lead to communication acceleration! Finally! In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 15750--15769. PMLR, 17--23 Jul 2022. URL https://proceedings.mlr.press/v162/mishchenko22b.html.
- Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. In Neural Information Processing Systems, 2021. URL https://api.semanticscholar.org/CorpusID:237396004.
- Yurii Nesterov. Lectures on convex optimization, 2018.
- Federated learning with buffered asynchronous aggregation. In International Conference on Artificial Intelligence and Statistics, pages 3581--3607. PMLR, 2022.
- FedSplit: an algorithmic framework for fast federated optimization. Advances in Neural Information Processing Systems, 33:7057--7066, 2020.
- ARock: an algorithmic framework for asynchronous ¶llel coordinate updates. SIAM Journal on Scientific Computing, 38(5):A2851--A2879, January 2016. ISSN 1064-8275, 1095-7197. doi:10.1137/15M1024950. URL http://arxiv.org/abs/1506.02396. arXiv:1506.02396 [cs, math, stat].
- Boris Polyak. Introduction to Optimization. 07 2020.
- Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Advances in Neural Information Processing Systems, 24, 2011.
- Adadelay: Delay adaptive distributed stochastic optimization. In Artificial Intelligence and Statistics, pages 957--965. PMLR, 2016.
- Critical parameters for scalable distributed learning with large batches and asynchronous updates. In International Conference on Artificial Intelligence and Statistics, pages 4042--4050. PMLR, 2021.
- Sebastian U. Stich. Local SGD converges fast and communicates little, May 2019. URL http://arxiv.org/abs/1805.09767. arXiv:1805.09767 [cs, math].
- The error-feedback framework: Better rates for SGD with delayed gradients and compressed updates. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435.
- Asynchronous gossip algorithms for stochastic optimization. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pages 3581--3586, 2009. doi:10.1109/CDC.2009.5399485.
- Unbounded gradients in federated learning with buffered asynchronous aggregation, 2022.
- Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Transactions on Automatic Control, 31(9):803--812, September 1986. ISSN 0018-9286. doi:10.1109/TAC.1986.1104412. URL http://ieeexplore.ieee.org/document/1104412/.
- Tackling the objective inconsistency problem in heterogeneous federated optimization, 2020.
- Asyncfeded: Asynchronous federated learning with euclidean distance based adaptive weight aggregation. arXiv preprint arXiv:2205.13797, 2022.
- Tackling the data heterogeneity in asynchronous federated learnin with cached update calibration.
- Asynchronous federated optimization. ArXiv, abs/1903.03934, 2019. URL https://api.semanticscholar.org/CorpusID:73729264.
- Asynchronous federated learning on heterogeneous devices: A survey. Computer Science Review, 50:100595, 2023.
- Federated optimization under intermittent client availability. INFORMS Journal on Computing, August 2023. doi:10.1287/ijoc.2022.0057. URL https://doi.org/10.1287%2Fijoc.2022.0057.
- Anarchic federated learning. In International Conference on Machine Learning, pages 25331--25363. PMLR, 2022.
- Communication-efficient federated learning with data and client heterogeneity, 2023. URL http://arxiv.org/abs/2206.10032. arXiv:2206.10032 [cs].
- Distributed asynchronous optimization with unbounded delays: How slow can you go? In International Conference on Machine Learning, pages 5970--5979. PMLR, 2018.
- Charikleia Iakovidou (5 papers)
- Kibaek Kim (43 papers)