Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Achieving Linear Speedup in Asynchronous Federated Learning with Heterogeneous Clients (2402.11198v1)

Published 17 Feb 2024 in cs.LG and cs.DC

Abstract: Federated learning (FL) is an emerging distributed training paradigm that aims to learn a common global model without exchanging or transferring the data that are stored locally at different clients. The Federated Averaging (FedAvg)-based algorithms have gained substantial popularity in FL to reduce the communication overhead, where each client conducts multiple localized iterations before communicating with a central server. In this paper, we focus on FL where the clients have diverse computation and/or communication capabilities. Under this circumstance, FedAvg can be less efficient since it requires all clients that participate in the global aggregation in a round to initiate iterations from the latest global model, and thus the synchronization among fast clients and straggler clients can severely slow down the overall training process. To address this issue, we propose an efficient asynchronous federated learning (AFL) framework called Delayed Federated Averaging (DeFedAvg). In DeFedAvg, the clients are allowed to perform local training with different stale global models at their own paces. Theoretical analyses demonstrate that DeFedAvg achieves asymptotic convergence rates that are on par with the results of FedAvg for solving nonconvex problems. More importantly, DeFedAvg is the first AFL algorithm that provably achieves the desirable linear speedup property, which indicates its high scalability. Additionally, we carry out extensive numerical experiments using real datasets to validate the efficiency and scalability of our approach when training deep neural networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” Found. Trends Mach. Learn., vol. 14, no. 1–2, pp. 1–210, 2021.
  2. S. Pandya, G. Srivastava, R. Jhaveri, M. R. Babu, S. Bhattacharya, P. K. R. Maddikunta, S. Mastorakis, M. J. Piran, and T. R. Gadekallu, “Federated learning for smart cities: A comprehensive survey,” Sustain. Energy Technol. Assess., vol. 55, p. 102987, 2023.
  3. H. G. Abreha, M. Hayajneh, and M. A. Serhani, “Federated learning in edge computing: a systematic survey,” Sensors, vol. 22, no. 2, p. 450, 2022.
  4. T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, 2020.
  5. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. 20th AISTATS.   PMLR, 2017, pp. 1273–1282.
  6. S. U. Stich, “Local SGD converges fast and communicates little,” in Proc. 7th ICLR, 2019.
  7. H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial worker participation in non-IID federated learning,” in Proc. 9th ICLR, 2021.
  8. L. Zhong, X. Chen, C. Xu, Y. Ma, M. Wang, Y. Zhao, and G.-M. Muntean, “A multi-user cost-efficient crowd-assisted VR content delivery solution in 5G-and-beyond heterogeneous networks,” IEEE Trans. Mob. Comput., 2022.
  9. M. Assran, A. Aytekin, H. R. Feyzmahdavian, M. Johansson, and M. G. Rabbat, “Advances in asynchronous parallel and distributed optimization,” Proc. IEEE, vol. 108, no. 11, pp. 2013–2031, 2020.
  10. F. Haddadpour, M. M. Kamani, M. Mahdavi, and V. Cadambe, “Local SGD with periodic averaging: Tighter analysis and adaptive synchronization,” Proc. 32nd NeurIPS, 2019.
  11. J. Wang and G. Joshi, “Cooperative SGD: A unified framework for the design and analysis of local-update SGD algorithms,” J. Mach. Learn. Res., vol. 22, no. 1, pp. 9709–9758, 2021.
  12. S. U. Stich and S. P. Karimireddy, “The error-feedback framework: Better rates for SGD with delayed gradients and compressed updates,” J. Mach. Learn. Res., vol. 21, no. 1, pp. 9613–9648, 2020.
  13. A. Khaled, K. Mishchenko, and P. Richtárik, “Tighter theory for local SGD on identical and heterogeneous data,” in Proc. 23rd AISTATS.   PMLR, 2020, pp. 4519–4529.
  14. F. Zhou and G. Cong, “On the convergence properties of a K𝐾Kitalic_K-step averaging stochastic gradient descent algorithm for nonconvex optimization,” in Proc. 27th IJCAI, 2018.
  15. H. Yu, S. Yang, and S. Zhu, “Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning,” in Proc. 33rd AAAI, 2019, pp. 5693–5700.
  16. H. Yu, R. Jin, and S. Yang, “On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization,” in Proc. 36th ICML.   PMLR, 2019, pp. 7184–7193.
  17. S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “SCAFFOLD: Stochastic controlled averaging for federated learning,” in Proc. 37th ICML.   PMLR, 2020, pp. 5132–5143.
  18. X. Gu, K. Huang, J. Zhang, and L. Huang, “Fast federated learning in the presence of arbitrary device unavailability,” in Proc. 34th NeurIPS, 2021, pp. 12 052–12 064.
  19. X. Wang, C. Jin, H.-T. Wai, and Y. Gu, “Linear speedup of incremental aggregated gradient methods on streaming data,” in Proc. 62nd CDC.   IEEE, 2023, pp. 4314–4319.
  20. A. Agarwal and J. C. Duchi, “Distributed delayed stochastic optimization,” in Proc. 24th NeurIPS, 2011.
  21. X. Lian, Y. Huang, Y. Li, and J. Liu, “Asynchronous parallel stochastic gradient for nonconvex optimization,” in Proc. 28th NeurIPS, 2015.
  22. S. Dutta, G. Joshi, S. Ghosh, P. Dube, and P. Nagpurkar, “Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD,” in Proc. 27th IJCAI.   PMLR, 2018, pp. 803–812.
  23. Y. Arjevani, O. Shamir, and N. Srebro, “A tight convergence analysis for stochastic gradient descent with delayed updates,” in Proc. 31st ALT.   PMLR, 2020, pp. 111–132.
  24. H. Gao, G. Wu, and R. Rossi, “Provable distributed stochastic gradient descent with delayed updates,” in Proc. 2021 SDM.   SIAM, 2021, pp. 441–449.
  25. A. Koloskova, S. U. Stich, and M. Jaggi, “Sharper convergence guarantees for asynchronous SGD for distributed and federated learning,” in Proc. 35th NeurIPS, 2022, pp. 17 202–17 215.
  26. J. Nguyen, K. Malik, H. Zhan, A. Yousefpour, M. Rabbat, M. Malek, and D. Huba, “Federated learning with buffered asynchronous aggregation,” in Proc. 25th AISTATS.   PMLR, 2022, pp. 3581–3607.
  27. Y. Wang, Y. Cao, J. Wu, R. Chen, and J. Chen, “Tackling the data heterogeneity in asynchronous federated learning with cached update calibration,” in Proc. 12th ICLR, 2023.
  28. C. Xie, S. Koyejo, and I. Gupta, “Asynchronous federated optimization,” arXiv preprint arXiv:1903.03934, 2019.
  29. Y. Chen, Y. Ning, M. Slawski, and H. Rangwala, “Asynchronous online federated learning for edge devices with non-IID data,” in Proc. 2020 IEEE Big Data.   IEEE, 2020, pp. 15–24.
  30. Y. Fraboni, R. Vidal, L. Kameni, and M. Lorenzi, “A general theory for federated optimization with asynchronous and heterogeneous clients updates,” J. Mach. Learn. Res., vol. 24, no. 110, pp. 1–43, 2023.
  31. H. Zakerinia, S. Talaei, G. Nadiradze, and D. Alistarh, “QuAFL: Federated averaging can be both asynchronous and communication-efficient,” arXiv preprint arXiv:2206.10032, 2022.
  32. L. Leconte, V. M. Nguyen, and E. Moulines, “FAVANO: Federated averaging with asynchronous nodes,” arXiv preprint arXiv:2305.16099, 2023.
  33. Q. Ma, Y. Xu, H. Xu, Z. Jiang, L. Huang, and H. Huang, “FedSA: A semi-asynchronous federated learning mechanism in heterogeneous edge computing,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3654–3672, 2021.
  34. Y. Sun, J. Shao, Y. Mao, J. H. Wang, and J. Zhang, “Semi-decentralized federated edge learning with data and device heterogeneity,” IEEE Trans. Netw. Service Manag., 2023.
  35. L. Mangasarian, “Parallel gradient distribution in unconstrained optimization,” SIAM J. Control Optim., vol. 33, no. 6, pp. 1916–1925, 1995.
  36. M. Zinkevich, M. Weimer, L. Li, and A. Smola, “Parallelized stochastic gradient descent,” in Proc. 23rd NeurIPS, 2010.
  37. T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” in Proc. MLSys 2, 2020, pp. 429–450.
  38. J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” in Proc. 33rd NeurIPS, 2020, pp. 7611–7623.
  39. X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-IID data,” in Proc. 7th ICLR, 2019.
  40. D. Avdiukhin and S. Kasiviswanathan, “Federated learning under arbitrary communication patterns,” in Proc. 38th ICML.   PMLR, 2021, pp. 425–435.
  41. F. Zhang, X. Liu, S. Lin, G. Wu, X. Zhou, J. Jiang, and X. Ji, “No one idles: Efficient heterogeneous federated learning with parallel edge and server computation,” in Proc. 40th ICML.   PMLR, 2023, pp. 41 399–41 413.
  42. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
  43. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  44. L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Hierarchical federated learning with quantization: Convergence analysis and system design,” IEEE Trans. Wirel. Commun., vol. 22, no. 1, pp. 2–18, 2022.
  45. Y. Sun, Y. Mao, and J. Zhang, “MimiC: Combating client dropouts in federated learning by mimicking central updates,” arXiv preprint arXiv:2306.12212, 2023.
  46. J. Shao, Y. Sun, S. Li, and J. Zhang, “DReS-FL: Dropout-resilient secure federated learning for non-IID clients via secret data sharing,” Proc. 35th NeurIPS, pp. 10 533–10 545, 2022.
  47. Y. Sun, J. Shao, Y. Mao, S. Li, and J. Zhang, “Stochastic coded federated learning: Theoretical analysis and incentive mechanism design,” IEEE Trans. Wirel. Commun., 2023.
  48. J. Shao, F. Wu, and J. Zhang, “Selective knowledge sharing for privacy-preserving federated distillation without a good teacher,” Nat. Commun., vol. 15, no. 1, p. 349, 2024.
  49. Y. Sun, Z. Lin, Y. Mao, S. Jin, and J. Zhang, “Channel and gradient-importance aware device scheduling for over-the-air federated learning,” IEEE Trans. Wirel. Commun., 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiaolu Wang (14 papers)
  2. Zijian Li (71 papers)
  3. Shi Jin (487 papers)
  4. Jun Zhang (1008 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com