LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression (2403.04348v2)
Abstract: In Distributed optimization and Learning, and even more in the modern framework of federated learning, communication, which is slow and costly, is critical. We introduce LoCoDL, a communication-efficient algorithm that leverages the two popular and effective techniques of Local training, which reduces the communication frequency, and Compression, in which short bitstreams are sent instead of full-dimensional vectors of floats. LoCoDL works with a large class of unbiased compressors that includes widely-used sparsification and quantization methods. LoCoDL provably benefits from local training and compression and enjoys a doubly-accelerated communication complexity, with respect to the condition number of the functions and the model dimension, in the general heterogenous regime with strongly convex functions. This is confirmed in practice, with LoCoDL outperforming existing algorithms.
- Optimal gradient compression for distributed and federated learning. preprint arXiv:2010.03246, 2020.
- Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations. IEEE Journal on Selected Areas in Information Theory, 1(1):217–226, 2020.
- Bertsekas, D. P. Convex optimization algorithms. Athena Scientific, Belmont, MA, USA, 2015.
- On biased compression for distributed learning. preprint arXiv:2002.12410, 2020.
- Practical secure aggregation for privacy-preserving machine learning. In Proc. of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191, 2017.
- LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/%7Ecjlin/libsvm.
- MURANA: A generic framework for stochastic variance-reduced optimization. In Proc. of the conference Mathematical and Scientific Machine Learning (MSML), PMLR 190, 2022.
- RandProx: Primal-dual optimization algorithms with randomized proximal updates. In Proc. of International Conference on Learning Representations (ICLR), 2023.
- Provably doubly accelerated federated learning: The first theoretically successful combination of local training and compressed communication. preprint arXiv:2210.13277, 2022a.
- EF-BV: A unified theory of error feedback and variance reduction mechanisms for biased and unbiased compression in distributed optimization. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2022b.
- TAMUNA: Doubly accelerated federated learning with local training, compression, and partial participation. preprint arXiv:2302.09832 presented at the Int. Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023, 2023.
- EF21 with bells & whistles: Practical algorithmic extensions of modern error feedback. preprint arXiv:2110.03294, 2021.
- A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent. In Proc. of 23rd Int. Conf. Artificial Intelligence and Statistics (AISTATS), PMLR 108, 2020a.
- Linearly converging error compensated SGD. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2020b.
- MARINA: Faster non-convex distributed learning with compression. In Proc. of 38th Int. Conf. Machine Learning (ICML), pp. 3788–3798, 2021.
- Variance-reduced methods for machine learning. Proc. of the IEEE, 108(11):1968–1983, November 2020.
- Can 5th Generation Local Training Methods Support Client Sampling? Yes! In Proc. of Int. Conf. Artificial Intelligence and Statistics (AISTATS), 2023.
- EF21-P and friends: Improved theoretical communication complexity for distributed optimization with bidirectional compression. In Proc. of 40th Int. Conf. Machine Learning (ICML), 2023.
- Federated learning with compression: Unified analysis and sharp guarantees. In Proc. of Int. Conf. Artificial Intelligence and Statistics (AISTATS), PMLR 130, pp. 2350–2358, 2021.
- One method to rule them all: Variance reduction for data, parameters and many new methods. preprint arXiv:1905.11266, 2019.
- Unbiased compression saves communication in distributed optimization: When and how much? preprint arXiv:2305.16297, 2023.
- Natural compression for distributed deep learning. In Proc. of the conference Mathematical and Scientific Machine Learning (MSML), PMLR 190, 2022.
- Stochastic distributed learning with gradient quantization and variance reduction. Optimization Methods and Software, 2022.
- Kairouz, P. et al. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1–2), 2021.
- SCAFFOLD: Stochastic controlled averaging for federated learning. In Proc. of 37th Int. Conf. Machine Learning (ICML), pp. 5132–5143, 2020.
- Federated optimization: distributed machine learning for on-device intelligence. arXiv:1610.02527, 2016a.
- Federated learning: Strategies for improving communication efficiency. In NIPS Private Multi-Party Machine Learning Workshop, 2016b. arXiv:1610.05492.
- Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 3(37):50–60, 2020a.
- Acceleration for compressed gradient descent in distributed and federated optimization. In Proc. of 37th Int. Conf. Machine Learning (ICML), volume PMLR 119, 2020b.
- A double residual compression algorithm for efficient distributed learning. In Proc. of Int. Conf. Artificial Intelligence and Statistics (AISTATS), PMLR 108, pp. 133–143, 2020.
- Federated random reshuffling with compression and variance reduction. preprint arXiv:arXiv:2205.03914, 2022.
- Variance reduced ProxSkip: Algorithm, theory and application to federated learning. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2022.
- Gradskip: Communication-accelerated local gradient methods with better computational complexity, 2023.
- Communication-efficient learning of deep networks from decentralized data. In Proc. of Int. Conf. Artificial Intelligence and Statistics (AISTATS), PMLR 54, 2017.
- Distributed learning with compressed gradient differences. arXiv:1901.09269, 2019.
- ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! In Proc. of the 39th International Conference on Machine Learning (ICML), July 2022.
- Artemis: tight convergence guarantees for bidirectional compression in federated learning. preprint arXiv:2006.14591, 2020.
- Preserved central model for faster bidirectional compression in distributed settings. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2021.
- FedPAQ: A communication-efficient federated learning method with periodic averaging and quantization. In Proc. of Int. Conf. Artificial Intelligence and Statistics (AISTATS), pp. 2021–2031, 2020.
- EF21: A new, simpler, theoretically better, and practically faster error feedback. In Proc. of 35th Conf. Neural Information Processing Systems (NeurIPS), 2021.
- Communication acceleration of local gradient methods via an accelerated primal-dual algorithm with an inexact prox. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2022a.
- Federated optimization algorithms with random reshuffling and gradient compression. preprint arXiv:2206.07021, 2022b.
- Optimal convergence rates for convex distributed optimization in networks. Journal of Machine Learning Research, 20:1–31, 2019.
- Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
- Optimization for Machine Learning. The MIT Press, 2011.
- DASHA: Distributed nonconvex optimization with communication compression, optimal oracle complexity, and no client synchronization. In Proc. of International Conference on Learning Representations (ICLR), 2023a.
- 2Direction: Theoretically faster distributed training with bidirectional communication compression. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2023b.
- Wang, J. et al. A field guide to federated optimization. preprint arXiv:2107.06917, 2021.
- Achieving linear speedup with partial worker participation in non-IID federated learning. In Proc. of International Conference on Learning Representations (ICLR), 2021.
- On the efficacy of server-aided federated learning against partial client participation. preprint https://openreview.net/forum?id=Dyzhru5NO3u, 2023.
- Federated learning with non-iid data. preprint arXiv:1806.00582, 2018.