Papers
Topics
Authors
Recent
Search
2000 character limit reached

Communication-Efficient Federated Optimization over Semi-Decentralized Networks

Published 30 Nov 2023 in cs.LG, cs.DC, and math.OC | (2311.18787v4)

Abstract: In large-scale federated and decentralized learning, communication efficiency is one of the most challenging bottlenecks. While gossip communication -- where agents can exchange information with their connected neighbors -- is more cost-effective than communicating with the remote server, it often requires a greater number of communication rounds, especially for large and sparse networks. To tackle the trade-off, we examine the communication efficiency under a semi-decentralized communication protocol, in which agents can perform both agent-to-agent and agent-to-server communication in a probabilistic manner. We design a tailored communication-efficient algorithm over semi-decentralized networks, referred to as PISCO, which inherits the robustness to data heterogeneity thanks to gradient tracking and allows multiple local updates for saving communication. We establish the convergence rate of PISCO for nonconvex problems and show that PISCO enjoys a linear speedup in terms of the number of agents and local updates. Our numerical results highlight the superior communication efficiency of PISCO and its resilience to data heterogeneity and various network topologies.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. An o⁢(1/k)𝑜1𝑘o(1/k)italic_o ( 1 / italic_k ) gradient method for network resource allocation problems. IEEE Transactions on Control of Network Systems, 1(1):64–73, 2014.
  2. Communication-efficient distributed learning: An overview. IEEE Journal on Selected Areas in Communications, 41(4):851–873, 2023.
  3. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):1–27, 2011.
  4. I. E. Carvajal-Roca and J. Wang. A semi-decentralized security framework for connected and autonomous vehicles. In 2021 IEEE 94th Vehicular Technology Conference (VTC2021-Fall), pages 1–6, 2021.
  5. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial Informatics, 9(1):427–438, 2013.
  6. Accelerating gossip sgd with periodic global averaging. In International Conference on Machine Learning, pages 1791–1802, 2021.
  7. L. Deng. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  8. S. Ge and T.-H. Chang. Gradient and variable tracking with multiple local sgd for decentralized non-convex learning. arXiv preprint arXiv:2302.01537, 2023.
  9. Hybrid local SGD for federated learning with heterogeneous communications. In International Conference on Learning Representations, 2022.
  10. Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. In Proceedings of the 34th International Conference on Machine Learning, pages 1529–1538, 2017.
  11. Matrix analysis. Cambridge university press, 2012.
  12. Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pages 5132–5143. PMLR, 2020.
  13. A unified theory of decentralized sgd with changing topology and local updates. In International Conference on Machine Learning, pages 5381–5393. PMLR, 2020.
  14. An improved analysis of gradient tracking for decentralized machine learning. In Advances in Neural Information Processing Systems, 2021.
  15. Decentralized deep learning with arbitrary communication compression. In International Conference on Learning Representations, 2020.
  16. Communication-efficient distributed optimization in networks with gradient tracking and variance reduction. The Journal of Machine Learning Research, 21(1):7331–7381, 2020.
  17. Semi-decentralized federated learning with cooperative D2D local model aggregations. IEEE Journal on Selected Areas in Communications, 39(12):3851–3869, 2021.
  18. On the convergence of FedAvg on non-IID data. In International Conference on Learning Representations, 2020.
  19. DESTRESS: Computation-optimal and communication-efficient decentralized nonconvex finite-sum optimization. SIAM Journal on Mathematics of Data Science, 4(3):1031–1051, 2022.
  20. Decentralized gradient tracking with local steps. arXiv preprint arXiv:2301.01313, 2023.
  21. P. D. Lorenzo and G. Scutari. NEXT: In-network nonconvex optimization. IEEE Transactions on Signal and Information Processing over Networks, 2(2):120–136, 2016.
  22. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  23. SDPipe: A semi-decentralized framework for heterogeneity-aware pipeline-parallel training. Proc. VLDB Endow., 16(9):2354–2363, jul 2023.
  24. On the performance of gradient tracking with local updates. arXiv preprint arXiv:2210.04757, 2022.
  25. A semi-decentralized framework for simultaneous expansion planning of privately owned multi-regional energy systems and sub-transmission grid. International Journal of Electrical Power & Energy Systems, 128:106795, 2021.
  26. A. Nedić and A. Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009.
  27. Network topology and communication-computation tradeoffs in decentralized optimization. Proceedings of the IEEE, 106(5):953–976, 2018.
  28. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM Journal on Optimization, 27(4):2597–2633, 2017.
  29. Connectivity-aware semi-decentralized federated learning over time-varying D2D networks. arXiv preprint arXiv:2303.08988, 2023.
  30. S. Pu and A. Nedić. Distributed stochastic gradient tracking methods. Mathematical Programming, 187:409–457, 2021.
  31. G. Qu and N. Li. Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems, 5(3):1245–1260, 2018.
  32. M. Rabbat and R. Nowak. Distributed optimization in sensor networks. In Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004, pages 20–27, 2004.
  33. Improving the sample and communication complexity for decentralized non-convex optimization: Joint gradient estimation and tracking. In International Conference on Machine Learning, pages 9217–9228. PMLR, 2020.
  34. High-dimensional inference over networks: Linear convergence and statistical guarantees. arXiv preprint arXiv:2201.08507, 2022.
  35. G. Scutari and Y. Sun. Distributed nonconvex constrained optimization over time-varying digraphs. Mathematical Programming, 176(1-2):497–544, 2019.
  36. Semi-decentralized federated edge learning with data and device heterogeneity. IEEE Transactions on Network and Service Management, 20(2):1487–1501, 2023.
  37. D2superscript𝐷2{D}^{2}italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT: Decentralized training over decentralized data. In International Conference on Machine Learning, pages 4848–4856. PMLR, 2018.
  38. Accelerating adaptive federated optimization with local gossip communications. In Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), 2022.
  39. SpiderBoost and momentum: Faster variance reduction algorithms. In Advances in Neural Information Processing Systems, pages 2406–2416, 2019.
  40. Communication-efficient adaptive federated learning. In Proceedings of the 39th International Conference on Machine Learning, pages 22802–22838, 2022.
  41. L. Xiao and S. Boyd. Fast linear iterations for distributed averaging. Systems & Control Letters, 53(1):65–78, 2004.
  42. BEER: Fast O⁢(1/T)𝑂1𝑇O(1/T)italic_O ( 1 / italic_T ) rate for decentralized nonconvex optimization with communication compression. In Advances in Neural Information Processing Systems.
  43. M. Zhu and S. Martínez. Discrete-time dynamic average consensus. Automatica, 46(2):322–329, 2010.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.