Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization (2405.15861v3)
Abstract: Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL significantly challenge its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstacle, especially in large model scenarios. Despite various communication-efficient strategies, the intrinsic dimension-dependent communication cost remains a major bottleneck for current FL implementations. This paper proposes a novel dimension-free communication algorithm -- DeComFL, which leverages the zeroth-order optimization techniques and reduces the communication cost from $\mathscr{O}(d)$ to $\mathscr{O}(1)$ by transmitting only a constant number of scalar values between clients and the server in each round, regardless of the dimension $d$ of the model parameters. Theoretically, in non-convex functions, we prove that our algorithm achieves state-of-the-art rates, which show a linear speedup of the number of clients and local steps under standard assumptions. With additional low effective rank assumption, we can further show the convergence rate is independent of the model dimension $d$ as well. Empirical evaluations, encompassing both classic deep learning training and LLM fine-tuning, demonstrate significant reductions in communication overhead. Notably, DeComFL achieves this by transmitting only around 1MB of data in total between the server and a client to fine-tune a model with billions of parameters.
- “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
- “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
- “Federated learning for healthcare informatics,” Journal of healthcare informatics research, vol. 5, pp. 1–19, 2021.
- “Federated learning meets blockchain in edge computing: Opportunities and challenges,” IEEE Internet of Things Journal, vol. 8, no. 16, pp. 12806–12825, 2021.
- “A field guide to federated optimization,” arXiv preprint arXiv:2107.06917, 2021.
- “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
- “signsgd: Compressed optimisation for non-convex problems,” in International Conference on Machine Learning. PMLR, 2018, pp. 560–569.
- “Powersgd: Practical low-rank gradient compression for distributed optimization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- “Cfedavg: achieving efficient communication and fast convergence in non-iid federated learning,” in 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt). IEEE, 2021, pp. 1–8.
- “Communication-efficient adaptive federated learning,” in International Conference on Machine Learning. PMLR, 2022, pp. 22802–22838.
- “Dadaquant: Doubly-adaptive quantization for communication-efficient federated learning,” in International Conference on Machine Learning. PMLR, 2022, pp. 8852–8866.
- “Fedcomloc: Communication-efficient distributed training of sparse and quantized models,” arXiv preprint arXiv:2403.09904, 2024.
- “Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization,” in International conference on artificial intelligence and statistics. PMLR, 2020, pp. 2021–2031.
- “Stochastic controlled averaging for federated learning with communication compression,” arXiv preprint arXiv:2308.08165, 2023.
- “Analysis of error feedback in federated non-convex optimization with biased compression: Fast convergence and partial participation,” in International Conference on Machine Learning. PMLR, 2023, pp. 19638–19688.
- “Federated learning with compression: Unified analysis and sharp guarantees,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2350–2358.
- “Communication-efficient federated learning via optimal client sampling,” arXiv preprint arXiv:2007.15197, 2020.
- “Random gradient-free minimization of convex functions,” Foundations of Computational Mathematics, vol. 17, pp. 527–566, 2017.
- “Stochastic first-and zeroth-order methods for nonconvex stochastic programming,” SIAM Journal on Optimization, vol. 23, no. 4, pp. 2341–2368, 2013.
- “A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications,” IEEE Signal Processing Magazine, vol. 37, no. 5, pp. 43–54, 2020.
- “Optimal rates for zero-order convex optimization: The power of two function evaluations,” IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2788–2806, 2015.
- “Fine-tuning language models with just forward passes,” Advances in Neural Information Processing Systems, vol. 36, pp. 53038–53075, 2023.
- “Deepzero: Scaling up zeroth-order optimization for deep model training,” arXiv preprint arXiv:2310.02025, 2023.
- “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models,” in Proceedings of the 10th ACM workshop on artificial intelligence and security, 2017, pp. 15–26.
- “Revisiting zeroth-order optimization for memory-efficient llm fine-tuning: A benchmark,” arXiv preprint arXiv:2402.11592, 2024.
- “Multitask prompted training enables zero-shot task generalization,” arXiv preprint arXiv:2110.08207, 2021.
- “Adaptive gradient sparsification for efficient federated learning: An online learning approach,” in 2020 IEEE 40th international conference on distributed computing systems (ICDCS). IEEE, 2020, pp. 300–310.
- “Time-correlated sparsification for communication-efficient federated learning,” in 2021 IEEE International Symposium on Information Theory (ISIT). IEEE, 2021, pp. 461–466.
- “Ggs: General gradient sparsification for federated learning in edge computing,” in ICC 2020-2020 IEEE International Conference on Communications (ICC). IEEE, 2020, pp. 1–7.
- “Communication-efficient federated learning: A variance-reduced stochastic approach with adaptive sparsification,” IEEE Transactions on Signal Processing, 2023.
- “Gossipfl: A decentralized federated learning framework with sparsified and adaptive communication,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 3, pp. 909–922, 2022.
- “Federated learning with quantization constraints,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 8851–8855.
- “Adaptive quantization of model updates for communication-efficient federated learning,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 3110–3114.
- “Wireless quantized federated learning: a joint computation and communication design,” IEEE Transactions on Communications, 2023.
- “Communication-efficient federated learning for heterogeneous edge devices based on adaptive gradient quantization,” in IEEE INFOCOM 2023-IEEE Conference on Computer Communications. IEEE, 2023, pp. 1–10.
- “Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning,” Journal of Machine Learning Research, vol. 22, no. 165, pp. 1–73, 2021.
- “Sparsified sgd with memory,” Advances in neural information processing systems, vol. 31, 2018.
- “Understanding top-k sparsification in distributed deep learning,” arXiv preprint arXiv:1911.08772, 2019.
- “A zeroth-order block coordinate descent algorithm for huge-scale black-box optimization,” in International Conference on Machine Learning. PMLR, 2021, pp. 1193–1203.
- “Black-box generalization: Stability of zeroth-order learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 31525–31541, 2022.
- “Asynchronous distributed reinforcement learning for lqr control via zeroth-order block coordinate descent,” IEEE Transactions on Automatic Control, 2024.
- “Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach,” IEEE Transactions on Automatic Control, vol. 67, no. 12, pp. 6429–6444, 2021.
- “Communication-efficient stochastic zeroth-order optimization for federated learning,” IEEE Transactions on Signal Processing, vol. 70, pp. 5058–5073, 2022.
- “Does federated learning really need backpropagation?,” arXiv preprint arXiv:2301.12195, 2023.
- “signsgd via zeroth-order oracle,” in International Conference on Learning Representations, 2018.
- “Zeroth-order stochastic variance reduction for nonconvex optimization,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- “Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization,” Advances in neural information processing systems, vol. 32, 2019.
- “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
- “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631–1642.
- “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461, 2018.
- “The CommitmentBank: Investigating projection in naturally occurring discourse,” 2019, To appear in proceedings of Sinn und Bedeutung 23. Data can be found at https://github.com/mcdm/CommitmentBank/.
- “Opt: Open pre-trained transformer language models,” arXiv preprint arXiv:2205.01068, 2022.
- “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
- “Prefix-tuning: Optimizing continuous prompts for generation,” arXiv preprint arXiv:2101.00190, 2021.
- “Rethinking the value of network pruning,” arXiv preprint arXiv:1810.05270, 2018.
- Yurii Nesterov, Introductory lectures on convex optimization: A basic course, vol. 87, Springer Science & Business Media, 2013.
- Zhe Li (210 papers)
- Bicheng Ying (32 papers)
- Zidong Liu (4 papers)
- Haibo Yang (38 papers)
- Chaosheng Dong (18 papers)