Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse Training for Federated Learning with Regularized Error Correction (2312.13795v2)

Published 21 Dec 2023 in cs.LG, cs.AI, and cs.DC

Abstract: Federated Learning (FL) has attracted much interest due to the significant advantages it brings to training deep neural network (DNN) models. However, since communications and computation resources are limited, training DNN models in FL systems face challenges such as elevated computational and communication costs in complex tasks. Sparse training schemes gain increasing attention in order to scale down the dimensionality of each client (i.e., node) transmission. Specifically, sparsification with error correction methods is a promising technique, where only important updates are sent to the parameter server (PS) and the rest are accumulated locally. While error correction methods have shown to achieve a significant sparsification level of the client-to-PS message without harming convergence, pushing sparsity further remains unresolved due to the staleness effect. In this paper, we propose a novel algorithm, dubbed Federated Learning with Accumulated Regularized Embeddings (FLARE), to overcome this challenge. FLARE presents a novel sparse training approach via accumulated pulling of the updated models with regularization on the embeddings in the FL process, providing a powerful solution to the staleness effect, and pushing sparsity to an exceptional level. The performance of FLARE is validated through extensive experiments on diverse and complex models, achieving a remarkable sparsity level (10 times and more beyond the current state-of-the-art) along with significantly improved accuracy. Additionally, an open-source software package has been developed for the benefit of researchers and developers in related fields.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. R. Greidi and K. Cohen, “An open source software of flare algorithm and simulations available at https://github.com/RanGreidi/FLARE,” Nov. 2023.
  2. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, pp. 1273–1282, PMLR, 2017.
  3. M. Aledhari, R. Razzak, R. M. Parizi, and F. Saeed, “Federated learning: A survey on enabling technologies, protocols, and applications,” IEEE Access, vol. 8, pp. 140699–140725, 2020.
  4. T. Gafni, N. Shlezinger, K. Cohen, Y. C. Eldar, and H. V. Poor, “Federated learning: A signal processing perspective,” IEEE Signal Processing Magazine, vol. 39, pp. 14–41, may 2022.
  5. M. Mohammadi Amiri and D. Gündüz, “Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air,” IEEE Transactions on Signal Processing, vol. 68, pp. 2155–2169, 2020.
  6. M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A joint learning and communications framework for federated learning over wireless networks,” IEEE Transactions on Wireless Communications, 2020.
  7. M. S. H. Abad, E. Ozfatura, D. Gunduz, and O. Ercetin, “Hierarchical federated learning across heterogeneous cellular networks,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8866–8870, IEEE, 2020.
  8. O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning for distributed dynamic spectrum access,” IEEE transactions on wireless communications, vol. 18, no. 1, pp. 310–323, 2018.
  9. T. Gafni and K. Cohen, “Distributed learning over markovian fading channels for stable spectrum access,” IEEE Access, vol. 10, pp. 46652–46669, 2022.
  10. T. Gafni, M. Yemini, and K. Cohen, “Learning in restless bandits under exogenous global markov process,” IEEE Transactions on Signal Processing, vol. 70, pp. 5679–5693, 2022.
  11. D. B. Ami, K. Cohen, and Q. Zhao, “Client selection for generalization in accelerated federated learning: A multi-armed bandit approach,” arXiv preprint arXiv:2303.10373, 2023.
  12. S. Salgia, Q. Zhao, T. Gabay, and K. Cohen, “A communication-efficient adaptive algorithm for federated learning under cumulative regret,” arXiv preprint arXiv:2301.08869, 2023.
  13. T. Sery and K. Cohen, “On analog gradient descent learning over multiple access fading channels,” IEEE Transactions on Signal Processing, vol. 68, p. 2897–2911, 2020.
  14. T. Sery, N. Shlezinger, K. Cohen, and Y. Eldar, “Over-the-air federated learning from heterogeneous data,” IEEE Transactions on Signal Processing, vol. 69, p. 3796–3811, 2021.
  15. R. Paul, Y. Friedman, and K. Cohen, “Accelerated gradient descent learning over multiple access fading channels,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 2, pp. 532–547, 2022.
  16. T. L. S. Gez and K. Cohen, “Subgradient descent learning over fading multiple access channels with over-the-air computation,” IEEE Access, vol. 11, pp. 94623–94635, 2023.
  17. S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
  18. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” arXiv preprint arXiv: 1608.08710, 2017.
  19. D. Livne and K. Cohen, “Pops: Policy pruning and shrinking for deep reinforcement learning,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 4, pp. 789–801, 2020.
  20. G. Zhu, Y. Du, D. Gündüz, and K. Huang, “One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis,” IEEE Transactions on Wireless Communications, vol. 20, no. 3, pp. 2120–2135, 2020.
  21. Z. Zhao, Y. Mao, Y. Liu, L. Song, Y. Ouyang, X. Chen, and W. Ding, “Towards efficient communications in federated learning: A contemporary survey,” Journal of the Franklin Institute, vol. 360, no. 12, pp. 8669–8703, 2023.
  22. B. Li, Z. Li, and Y. Chi, “Destress: Computation-optimal and communication-efficient decentralized nonconvex finite-sum optimization,” SIAM Journal on Mathematics of Data Science, vol. 4, no. 3, pp. 1031–1051, 2022.
  23. M. Chen, N. Shlezinger, H. V. Poor, Y. C. Eldar, and S. Cui, “Joint resource management and model compression for wireless federated learning,” in ICC 2021-IEEE International Conference on Communications, pp. 1–6, IEEE, 2021.
  24. Z. Li, H. Zhao, B. Li, and Y. Chi, “Soteriafl: A unified framework for private federated learning with communication compression,” Advances in Neural Information Processing Systems, vol. 35, pp. 4285–4300, 2022.
  25. H. Zhao, B. Li, Z. Li, P. Richtarik, and Y. Chi, “Beer: Fast o(1/t) rate for decentralized nonconvex optimization with communication compression,” in Advances in Neural Information Processing Systems, vol. 35, pp. 31653–31667, Curran Associates, Inc., 2022.
  26. Y. Xue and V. Lau, “Riemannian low-rank model compression for federated learning with over-the-air aggregation,” IEEE Transactions on Signal Processing, 2023.
  27. B. Li and Y. Chi, “Convergence and privacy of decentralized nonconvex optimization with gradient clipping and communication compression,” arXiv preprint arXiv:2305.09896, 2023.
  28. D. Rothchild, A. Panda, E. Ullah, N. Ivkin, I. Stoica, V. Braverman, J. Gonzalez, and R. Arora, “Fetchsgd: Communication-efficient federated learning with sketching,” International Conference on Machine Learning, p. 8253 – 8265, 2020. Cited by: 73.
  29. F. Sattler, S. Wiedemann, K.-R. Müller, and W. Samek, “Robust and communication-efficient federated learning from non-i.i.d. data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3400–3413, 2020.
  30. B. Wang, J. Fang, H. Li, and B. Zeng, “Communication-efficient federated learning: A variance-reduced stochastic approach with adaptive sparsification,” IEEE Transactions on Signal Processing, 2023.
  31. F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, “1-bit stochastic gradient descent and application to data-parallel distributed training of speech dnns,” in Interspeech 2014, September 2014.
  32. N. Strom, “Scalable distributed dnn training using commodity gpu cloud computing,” in Interspeech, 2015.
  33. A. F. Aji and K. Heafield, “Sparse communication for distributed gradient descent,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (Copenhagen, Denmark), pp. 440–445, Association for Computational Linguistics, Sept. 2017.
  34. Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep gradient compression: Reducing the communication bandwidth for distributed training,” CoRR, vol. abs/1712.01887, 2017.
  35. S. Li, Q. Qi, J. Wang, H. Sun, Y. Li, and F. R. Yu, “Ggs: General gradient sparsification for federated learning in edge computing,” in ICC 2020 - 2020 IEEE International Conference on Communications (ICC), pp. 1–7, 2020.
  36. F. Sattler, S. Wiedemann, K. Müller, and W. Samek, “Sparse binary compression: Towards distributed deep learning with minimal communication,” CoRR, vol. abs/1805.08768, 2018.
  37. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  38. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  39. A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” 2009.
  40. Y. Xu, Y. Liao, H. Xu, Z. Ma, L. Wang, and J. Liu, “Adaptive control of local updating and model compression for efficient federated learning,” IEEE Transactions on Mobile Computing, vol. 22, no. 10, pp. 5675–5689, 2023.
  41. D. Basu, D. Data, C. Karakus, and S. Diggavi, “Qsparse-local-sgd: Distributed sgd with quantization, sparsification and local computations,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  42. H. Wang, S. Sievert, S. Liu, Z. Charles, D. Papailiopoulos, and S. Wright, “Atomo: Communication-efficient learning via atomic sparsification,” Advances in neural information processing systems, vol. 31, 2018.
  43. J. Wang and G. Joshi, “Adaptive communication strategies to achieve the best error-runtime trade-off in local-update sgd,” Proceedings of Machine Learning and Systems, vol. 1, pp. 212–229, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ran Greidi (1 paper)
  2. Kobi Cohen (52 papers)
Citations (1)