FedDec: Peer-to-peer Aided Federated Learning (2306.06715v1)
Abstract: Federated learning (FL) has enabled training machine learning models exploiting the data of multiple agents without compromising privacy. However, FL is known to be vulnerable to data heterogeneity, partial device participation, and infrequent communication with the server, which are nonetheless three distinctive characteristics of this framework. While much of the recent literature has tackled these weaknesses using different tools, only a few works have explored the possibility of exploiting inter-agent communication to improve FL's performance. In this work, we present FedDec, an algorithm that interleaves peer-to-peer communication and parameter averaging (similar to decentralized learning in networks) between the local gradient updates of FL. We analyze the convergence of FedDec under the assumptions of non-iid data distribution, partial device participation, and smooth and strongly convex costs, and show that inter-agent communication alleviates the negative impact of infrequent communication rounds with the server by reducing the dependence on the number of local updates $H$ from $O(H2)$ to $O(H)$. Furthermore, our analysis reveals that the term improved in the bound is multiplied by a constant that depends on the spectrum of the inter-agent communication graph, and that vanishes quickly the more connected the network is. We confirm the predictions of our theory in numerical simulations, where we show that FedDec converges faster than FedAvg, and that the gains are greater as either $H$ or the connectivity of the network increase.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, pp. 1273–1282, PMLR, 2017.
- S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” in International Conference on Machine Learning, pp. 5132–5143, PMLR, 2020.
- T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020.
- Z. Qu, K. Lin, Z. Li, and J. Zhou, “Federated learning’s blessing: Fedavg has linear speedup,” in ICLR 2021-Workshop on Distributed and Private Machine Learning (DPML), 2021.
- L. Xiao, A. W. Yu, Q. Lin, and W. Chen, “DSCOVR: Randomized primal-dual block coordinate algorithms for asynchronous distributed optimization,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 1634–1691, 2019.
- B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild!: A lock-free approach to parallelizing stochastic gradient descent,” Advances in neural information processing systems, vol. 24, 2011.
- J. Liu, S. Wright, C. Ré, V. Bittorf, and S. Sridhar, “An asynchronous parallel stochastic coordinate descent algorithm,” in International Conference on Machine Learning, pp. 469–477, PMLR, 2014.
- V. Smith, S. Forte, M. Chenxin, M. Takác, M. I. Jordan, and M. Jaggi, “CoCoA: A general framework for communication-efficient distributed optimization,” Journal of Machine Learning Research, vol. 18, p. 230, 2018.
- A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
- W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015.
- J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for distributed optimization: Convergence analysis and network scaling,” IEEE Transactions on Automatic control, vol. 57, no. 3, pp. 592–606, 2011.
- K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulié, “Optimal algorithms for smooth and strongly convex distributed optimization in networks,” in international conference on machine learning, pp. 3027–3036, PMLR, 2017.
- X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, “Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- A. Koloskova, S. Stich, and M. Jaggi, “Decentralized stochastic optimization and gossip algorithms with compressed communication,” in International Conference on Machine Learning, pp. 3478–3487, PMLR, 2019.
- C. A. Uribe, S. Lee, A. Gasnikov, and A. Nedić, “A dual approach for optimal algorithms in distributed optimization over networks,” in 2020 Information Theory and Applications Workshop (ITA), pp. 1–37, IEEE, 2020.
- X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-iid data,” in International Conference on Learning Representations, 2020.
- J. Zhang, C. De Sa, I. Mitliagkas, and C. Ré, “Parallel SGD: When does averaging help?,” arXiv preprint arXiv:1606.07365, 2016.
- H. Hellaoui, O. Bekkouche, M. Bagaa, and T. Taleb, “Aerial control system for spectrum efficiency in uav-to-cellular communications,” IEEE Communications Magazine, vol. 56, no. 10, pp. 108–113, 2018.
- A. Asadi, Q. Wang, and V. Mancuso, “A survey on device-to-device communication in cellular networks,” IEEE Communications Surveys & Tutorials, vol. 16, no. 4, pp. 1801–1819, 2014.
- S. U. Stich, “Local SGD converges fast and communicates little,” in ICLR 2019-International Conference on Learning Representations, no. CONF, 2019.
- M. Yemini, R. Saha, E. Ozfatura, D. Gündüz, and A. J. Goldsmith, “Semi-decentralized federated learning with collaborative relaying,” in 2022 IEEE International Symposium on Information Theory (ISIT), pp. 1471–1476, IEEE, 2022.
- F. P.-C. Lin, S. Hosseinalipour, S. S. Azam, C. G. Brinton, and N. Michelusi, “Semi-decentralized federated learning with cooperative D2D local model aggregations,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3851–3869, 2021.
- L. Chou, Z. Liu, Z. Wang, and A. Shrivastava, “Efficient and less centralized federated learning,” in Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part I 21, pp. 772–787, Springer, 2021.
- S. Hosseinalipour, S. S. Azam, C. G. Brinton, N. Michelusi, V. Aggarwal, D. J. Love, and H. Dai, “Multi-stage hybrid federated learning over large-scale d2d-enabled fog networks,” IEEE/ACM Transactions on Networking, vol. 30, no. 4, pp. 1569–1584, 2022.
- A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized SGD with changing topology and local updates,” in International Conference on Machine Learning, pp. 5381–5393, PMLR, 2020.
- L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems & Control Letters, vol. 53, no. 1, pp. 65–78, 2004.
- S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE transactions on information theory, vol. 52, no. 6, pp. 2508–2530, 2006.
- G. Neglia, C. Xu, D. Towsley, and G. Calbi, “Decentralized gradient methods: does topology matter?,” in International Conference on Artificial Intelligence and Statistics, pp. 2348–2358, PMLR, 2020.
- T. Vogels, H. Hendrikx, and M. Jaggi, “Beyond spectral gap: The role of the topology in decentralized learning,” arXiv preprint arXiv:2206.03093, 2022.
- M. Barthélemy, “Spatial networks,” Physics reports, vol. 499, no. 1-3, pp. 1–101, 2011.
- M. E. Newman, “The structure and function of complex networks,” SIAM review, vol. 45, no. 2, pp. 167–256, 2003.