FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models (2403.09904v1)
Abstract: Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server, while being respectful of privacy. A critical bottleneck in FL is the communication cost. A pivotal strategy to mitigate this burden is \emph{Local Training}, which involves running multiple local stochastic gradient descent iterations between communication phases. Our work is inspired by the innovative \emph{Scaffnew} algorithm, which has considerably advanced the reduction of communication complexity in FL. We introduce FedComLoc (Federated Compressed and Local Training), integrating practical and effective compression into \emph{Scaffnew} to further enhance communication efficiency. Extensive experiments, using the popular TopK compressor and quantization, demonstrate its prowess in substantially reducing communication overheads in heterogeneous settings.
- QSGD: Communication-efficient SGD via gradient quantization and encoding. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Revisiting sparsity hunting in federated learning: Why does sparsity consensus matter? Transactions on Machine Learning Research, 2023.
- Lsq+: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 696–697, 2020.
- Robust quantization: One model to rule them all. Advances in neural information processing systems, 33:5308–5317, 2020.
- RandProx: Primal-dual optimization algorithms with randomized proximal updates. In Proc. of Int. Conf. Learning Representations (ICLR), 2023.
- Provably doubly accelerated federated learning: The first theoretically successful combination of local training and compressed communication. preprint arXiv:2210.13277, 2022.
- TAMUNA: Doubly accelerated federated learning with local training, compression, and partial participation. preprint arXiv:2302.09832, 2023.
- Sparse networks from scratch: Faster training without losing performance. preprint arXiv:1907.04840, 2019.
- Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning, pp. 2943–2952. PMLR, 2020.
- The state of sparsity in deep neural networks. preprint arXiv:1902.09574, 2019.
- Local SGD: Unified theory and new efficient methods. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2020.
- Can 5th Generation Local Training Methods Support Client Sampling? Yes! In Proc. of Int. Conf. Artificial Intelligence and Statistics (AISTATS), April 2023.
- Quantization robust federated learning for efficient inference on heterogeneous devices. preprint arXiv:2206.10844, 2022.
- On the convergence of local descent methods in federated learning. preprint arXiv:1910.14425, 2019.
- Federated learning with compression: Unified analysis and sharp guarantees. In International Conference on Artificial Intelligence and Statistics, pp. 2350–2358. PMLR, 2021.
- Improving low-precision network quantization via bin regularization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5261–5270, 2021.
- Fedtiny: Pruned federated learning towards specialized tiny models. preprint arXiv:2212.01977, 2022.
- Model pruning enables efficient federated learning on edge devices. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1–2):1–210, 2019.
- SCAFFOLD: Stochastic controlled averaging for on-device federated learning. In Proc. of Int. Conf. Machine Learning (ICML), 2020.
- First analysis of local GD on heterogeneous data. paper arXiv:1909.04715, presented at NeurIPS Workshop on Federated Learning for Data Privacy and Confidentiality, 2019.
- Tighter theory for local SGD on identical and heterogeneous data. In Proc. of 23rd Int. Conf. Artificial Intelligence and Statistics (AISTATS), 2020.
- Krizhevsky, A. Learning multiple layers of features from tiny images. Technical Report, Computer Science Department, University of Toronto, 2009.
- Accurate neural network pruning requires rethinking sparse optimization. preprint arXiv:2308.02060, 2023.
- LeCun, Y. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- On the convergence of FedAvg on non-IID data. In Proc. of Int. Conf. Learning Representations (ICLR), 2020.
- From local SGD to local fixed point methods for federated learning. In Proc. of 37th Int. Conf. Machine Learning (ICML), 2020.
- Variance reduced Proxskip: Algorithm, theory and application to federated learning. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2022.
- GradSkip: Communication-accelerated local gradient methods with better computational complexity. preprint arXiv:2210.16402, 2022.
- Federated learning of deep networks using model averaging. preprint arXiv:1602.05629, 2016.
- Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.
- ProxSkip: Yes! Local gradient steps provably lead to communication acceleration! Finally! In Proc. of 39th Int. Conf. Machine Learning (ICML), 2022.
- Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. In Proc. of Conf. Neural Information Processing Systems (NeurIPS), 2021.
- SparkNet: Training deep networks in Spark. In Proc. of Int. Conf. Learning Representations (ICLR), 2016.
- Parallel training of DNNs with natural gradient and parameter averaging. preprint arXiv:1410.7455, 2014.
- Understanding machine learning: from theory to algorithms. Cambridge University Press, 2014.
- Nipq: Noise proxy-based integrated pseudo-quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3852–3861, 2023.
- A field guide to federated optimization. preprint arXiv:2107.06917, 2021.
- Explicit personalization and local training: Double communication acceleration in federated learning. preprint arXiv:2305.13170, 2023.
- Fedp3: Federated personalized and privacy-friendly network pruning under model heterogeneity. International Conference on Learning Representations (ICLR), 2024.
- FedLab: A flexible federated learning framework. Journal of Machine Learning Research, 24(100):1–7, 2023. URL http://jmlr.org/papers/v24/22-0440.html.
- Fedcr: Personalized federated learning based on across-client common representation with conditional mutual information regularization. In International Conference on Machine Learning, pp. 41314–41330. PMLR, 2023.