Papers
Topics
Authors
Recent
Search
2000 character limit reached

Communication Efficient ConFederated Learning: An Event-Triggered SAGA Approach

Published 28 Feb 2024 in cs.LG, cs.DC, and eess.SP | (2402.18018v1)

Abstract: Federated learning (FL) is a machine learning paradigm that targets model training without gathering the local data dispersed over various data sources. Standard FL, which employs a single server, can only support a limited number of users, leading to degraded learning capability. In this work, we consider a multi-server FL framework, referred to as \emph{Confederated Learning} (CFL), in order to accommodate a larger number of users. A CFL system is composed of multiple networked edge servers, with each server connected to an individual set of users. Decentralized collaboration among servers is leveraged to harness all users' data for model training. Due to the potentially massive number of users involved, it is crucial to reduce the communication overhead of the CFL system. We propose a stochastic gradient method for distributed learning in the CFL framework. The proposed method incorporates a conditionally-triggered user selection (CTUS) mechanism as the central component to effectively reduce communication overhead. Relying on a delicately designed triggering condition, the CTUS mechanism allows each server to select only a small number of users to upload their gradients, without significantly jeopardizing the convergence performance of the algorithm. Our theoretical analysis reveals that the proposed algorithm enjoys a linear convergence rate. Simulation results show that it achieves substantial improvement over state-of-the-art algorithms in terms of communication efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. J. Konečný, H. McMahan, F. Yu, P. Richtárik, A. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.
  2. T. Li, A. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
  3. S. Stich, “Local SGD converges fast and communicates little,” International Conference on Learning Representations, pp. 1–17, 2019.
  4. F. Haddadpour, M. Kamani, M. Mahdavi, and V. Cadambe, “Local SGD with periodic averaging: Tighter analysis and adaptive synchronization,” Advances in Neural Information Processing Systems, pp. 11 080–11 092, 2019.
  5. H. Yuan and T. Ma, “Federated accelerated stochastic gradient descent,” Advances in Neural Information Processing Systems, pp. 5332–5344, 2020.
  6. Z. Li, D. Kovalev, X. Qian, and P. Richtárik, “Acceleration for compressed gradient descent in distributed and federated optimization,” International Conference on Machine Learning, pp. 5895–5904, 2020.
  7. L. Condat, I. Agarsky, and P. Richtárik, “Provably doubly accelerated federated learning: The first theoretically successful combination of local training and compressed communication,” arXiv preprint arXiv:2210.13277, 2022.
  8. K. Mishchenko, G. Malinovsky, S. Stich, and P. Richtárik, “Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!” International Conference on Machine Learning, pp. 15 750–15 769, 2022.
  9. J. Wang and G. Joshi, “Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms,” Journal of Machine Learning Research, vol. 22, no. 213, pp. 1–50, 2021.
  10. R. Pathak and M. Wainwright, “FedSplit: An algorithmic framework for fast federated optimization,” Advances in Neural Information Processing Systems, pp. 7057–7066, 2020.
  11. S. Cen, H. Zhang, Y. Chi, W. Chen, and T. Liu, “Convergence of distributed stochastic variance reduced methods without sampling extra data,” IEEE Transactions on Signal Processing, vol. 68, pp. 3976–3989, 2020.
  12. X. Zhang, M. Hong, S. Dhople, W. Yin, and Y. Liu, “FedPD: A federated learning framework with adaptivity to non-iid data,” IEEE Transactions on Signal Processing, vol. 69, pp. 6055–6070, 2021.
  13. X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-iid data,” arXiv preprint arXiv:1907.02189, 2019.
  14. T. Li, A. Sahu, M. Sanjabi, M. Zaheer, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine Learning and Systems, vol. 2, pp. 429–450, 2020.
  15. W. Liu, L. Chen, Y. Chen, and W. Zhang, “Accelerating federated learning via momentum gradient descent,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 8, pp. 1754–1766, 2022.
  16. H. Yang, Z. Liu, T. Quek, and H. Poor, “Scheduling policies for federated learning in wireless networks,” IEEE Transactions on Communications, vol. 68, no. 1, pp. 317–333, 2019.
  17. M. Amiri, D. Gündüz, S. Kulkarni, and H. Poor, “Convergence of update aware device scheduling for federated learning at the wireless edge,” IEEE Transactions on Wireless Communications, vol. 20, no. 6, pp. 3643–3658, 2021.
  18. J. Ren, Y. He, D. Wen, G. Yu, K. Huang, and D. Guo, “Scheduling for cellular federated edge learning with importance and channel awareness,” IEEE Transactions on Wireless Communications, vol. 19, no. 11, pp. 7690–7703, 2020.
  19. A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie, and R. Pedarsani, “FedPAQ: A communication-efficient federated learning method with periodic averaging and quantization,” International Conference on Artificial Intelligence and Statistics, pp. 2021–2031, 2020.
  20. M. Chen, N. Shlezinger, H. Poor, Y. Eldar, and S. Cui, “Communication-efficient federated learning,” Proceedings of the National Academy of Sciences, vol. 118, no. 17, p. e2024789118, 2021.
  21. Q. Dinh, N. Pham, D. Phan, and L. Nguyen, “FedDR-Randomized douglas-rachford splitting algorithms for nonconvex federated composite optimization,” Advances in Neural Information Processing Systems, pp. 30 326–30 338, 2021.
  22. T. Chen, G. Giannakis, T. Sun, and W. Yin, “LAG: Lazily aggregated gradient for communication-efficient distributed learning,” Advances in Neural Information Processing Systems, pp. 5050–5060, 2018.
  23. T. Chen, Y. Sun, and W. Yin, “Communication-adaptive stochastic gradient methods for distributed learning,” IEEE Transactions on Signal Processing, vol. 69, no. 3, pp. 4637–4651, 2021.
  24. J. Sun, T. Chen, G. Giannakis, Q. Yang, and Z. Yang, “Lazily aggregated quantized gradient innovation for communication-efficient federated learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 2031–2044, 2022.
  25. A. Aji and K. Heafield, “Sparse communication for distributed gradient descent,” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 440–445, 2017.
  26. D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: Communication-efficient SGD via gradient quantization and encoding,” Advances in Neural Information Processing Systems, pp. 1707–1718, 2017.
  27. J. Bernstein, Y. Wang, K. Azizzadenesheli, and A. Anandkumar, “SignSGD: Compressed optimisation for non-convex problems,” International Conference on Machine Learning, pp. 560–569, 2018.
  28. S. Karimireddy, Q. Rebjock, S. Stich, and M. Jaggi, “Error feedback fixes signSGD and other gradient compression schemes,” International Conference on Machine Learning, pp. 3252–3261, 2019.
  29. J. Wu, W. Huang, J. Huang, and T. Zhang, “Error compensated quantized SGD and its applications to large-scale distributed optimization,” International Conference on Machine Learning, pp. 5325–5333, 2018.
  30. N. Shlezinger, M. Chen, Y. Eldar, H. Poor, and S. Cui, “UVeQFed: Universal vector quantization for federated learning,” IEEE Transactions on Signal Processing, vol. 69, pp. 500–514, 2020.
  31. S. Stich, J. Cordonnier, and M. Jaggi, “Sparsified SGD with memory,” Advances in Neural Information Processing Systems, pp. 4452–4463, 2018.
  32. A. Beznosikov, S. Horváth, P. Richtárik, and M. Safaryan, “On biased compression for distributed learning,” arXiv preprint arXiv:2002.12410, 2020.
  33. S. Horváth, D. Kovalev, K. Mishchenko, S. Stich, and P. Richtárik, “Stochastic distributed learning with gradient quantization and variance reduction,” Optimization Methods and Software, vol. 38, no. 1, pp. 91–106, 2023.
  34. P. Richtárik, I. Sokolov, E. Gasanov, I. Fatkhullin, Z. Li, and E. Gorbunov, “3PC: Three point compressors for communication-efficient distributed training and a better theory for lazy aggregation,” International Conference on Machine Learning, pp. 18 596–18 648, 2022.
  35. I. Hegedüs, G. Danner, and M. Jelasity, “Gossip learning as a decentralized alternative to federated learning,” IFIP International Conference on Distributed Applications and Interoperable Systems, pp. 74–90, 2019.
  36. S. Savazzi, M. Nicoli, and V. Rampa, “Federated learning with cooperating devices: A consensus approach for massive IoT networks,” IEEE Internet of Things Journal, vol. 7, no. 5, pp. 4641–4654, 2020.
  37. H. Xing, O. Simeone, and S. Bi, “Federated learning over wireless device-to-device networks: Algorithms and convergence analysis,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3723–3741, 2021.
  38. A. Koloskova, S. Stich, and M. Jaggi, “Decentralized stochastic optimization and gossip algorithms with compressed communication,” International Conference on Machine Learning, pp. 3478–3487, 2019.
  39. H. Ye, L. Liang, and G. Li, “Decentralized federated learning with unreliable communications,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 3, pp. 487–500, 2022.
  40. R. Xin, U. Khan, and S. Kar, “Variance-reduced decentralized stochastic optimization with accelerated convergence,” IEEE Transactions on Signal Processing, vol. 68, pp. 6255–6271, 2020.
  41. ——, “Fast decentralized nonconvex finite-sum optimization with recursive variance reduction,” SIAM Journal on Optimization, vol. 32, no. 1, pp. 1–28, 2022.
  42. D. Kovalev, A. Koloskova, M. Jaggi, P. Richtarik, and S. Stich, “A linearly convergent algorithm for decentralized optimization: Sending less bits for free!” International Conference on Artificial Intelligence and Statistics, pp. 4087–4095, 2021.
  43. N. Singh, D. Data, J. George, and S. Diggavi, “SPARQ-SGD: Event-triggered and compressed communication in decentralized optimization,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 721–736, 2022.
  44. M. Qureshi, R. Xin, S. Kar, and U. Khan, “Push-SAGA: A decentralized stochastic algorithm with variance reduction over directed graphs,” IEEE Control Systems Letters, vol. 6, pp. 1202–1207, 2021.
  45. ——, “Variance reduced stochastic optimization over directed graphs with row and column stochastic weights,” arXiv preprint arXiv:2202.03346, 2022.
  46. M. Qureshi and U. Khan, “Stochastic first-order methods over distributed data,” 2022 IEEE 12th Sensor Array and Multichannel Signal Processing Workshop, pp. 405–409, 2022.
  47. B. Wang, J. Fang, H. Li, X. Yuan, and Q. Ling, “Confederated learning: Federated learning with decentralized edge servers,” IEEE Transactions on Signal Processing, vol. 71, pp. 248–263, 2023.
  48. S. Kia, J. Cortés, and S. Martinez, “Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication,” Automatica, vol. 55, pp. 254–264, 2015.
  49. Y. Kajiyama, N. Hayashi, and S. Takai, “Distributed subgradient method with edge-based event-triggered communication,” IEEE Transactions on Automatic Control, vol. 63, no. 7, pp. 2248–2255, 2018.
  50. J. George and P. Gurram, “Distributed stochastic gradient descent with event-triggered communication,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 7169–7178, 2020.
  51. L. Gao, S. Deng, H. Li, and C. Li, “An event-triggered approach for gradient tracking in consensus-based distributed optimization,” IEEE Transactions on Network Science and Engineering, vol. 9, no. 2, pp. 510–523, 2021.
  52. S. Zehtabi, S. Hosseinalipour, and C. Brinton, “Decentralized event-triggered federated learning with heterogeneous communication thresholds,” 2022 IEEE 61st Conference on Decision and Control, pp. 4680–4687, 2022.
  53. Y. Chen, R. Blum, M. Takák̆, and B. Sadler, “Distributed learning with sparsified gradient differences,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 3, pp. 585–600, 2022.
  54. P. Lorenzo and G. Scutari, “NEXT: In-network nonconvex optimization,” IEEE Transactions on Signal and Information Processing over Networks, vol. 2, no. 2, pp. 120–136, 2016.
  55. G. Qu and N. Li, “Harnessing smoothness to accelerate distributed optimization,” IEEE Transactions on Control of Network Systems, vol. 5, no. 3, pp. 1245–1260, 2017.
  56. S. Pu, W. Shi, J. Xu, and A. Nedić, “Push-Pull gradient methods for distributed optimization in networks,” IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 1–16, 2020.
  57. R. Xin and U. Khan, “A linear algorithm for optimization over directed graphs with geometric convergence,” IEEE Control Systems Letters, vol. 2, no. 3, pp. 315–320, 2018.
  58. R. Horn and C. Johnson, “Matrix analysis,” Cambridge university press, 2012.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.