Decentralized Learning Made Practical with Client Sampling (2302.13837v2)
Abstract: Decentralized learning (DL) leverages edge devices for collaborative model training while avoiding coordination by a central server. Due to privacy concerns, DL has become an attractive alternative to centralized learning schemes since training data never leaves the device. In a round of DL, all nodes participate in model training and exchange their model with some other nodes. Performing DL in large-scale heterogeneous networks results in high communication costs and prolonged round durations due to slow nodes, effectively inflating the total training time. Furthermore, current DL algorithms also assume all nodes are available for training and aggregation at all times, diminishing the practicality of DL. This paper presents Plexus, an efficient, scalable, and practical DL system. Plexus (1) avoids network-wide participation by introducing a decentralized peer sampler that selects small subsets of available nodes that train the model each round and, (2) aggregates the trained models produced by nodes every round. Plexus is designed to handle joining and leaving nodes (churn). We extensively evaluate Plexus by incorporating realistic traces for compute speed, pairwise latency, network capacity, and availability of edge devices in our experiments. Our experiments on four common learning tasks empirically show that Plexus reduces time-to-accuracy by 1.2-8.3x, communication volume by 2.4-15.3x and training resources needed for convergence by 6.4-370x compared to baseline DL algorithms.
- X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, “Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized SGD with changing topology and local updates,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 5381–5393. [Online]. Available: https://proceedings.mlr.press/v119/koloskova20a.html
- R. Ormándi, I. Hegedűs, and M. Jelasity, “Gossip learning with linear models on fully distributed data,” Concurrency and Computation: Practice and Experience, vol. 25, no. 4, pp. 556–571, 2013.
- X. Lian, W. Zhang, C. Zhang, and J. Liu, “Asynchronous decentralized parallel stochastic gradient descent,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 3043–3052. [Online]. Available: https://proceedings.mlr.press/v80/lian18a.html
- H. Yu, R. Jin, and S. Yang, “On the linear speedup analysis of communication efficient momentum sgd for distributed non-convex optimization,” in International Conference on Machine Learning. PMLR, 2019, pp. 7184–7193.
- H. Kasyap and S. Tripathy, “Privacy-preserving decentralized learning framework for healthcare system,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 17, no. 2s, pp. 1–24, 2021.
- B. C. Tedeschini, S. Savazzi, R. Stoklasa, L. Barbieri, I. Stathopoulos, M. Nicoli, and L. Serio, “Decentralized federated learning for healthcare networks: A case study on tumor segmentation,” IEEE Access, vol. 10, pp. 8693–8708, 2022.
- S. Lu, Y. Zhang, and Y. Wang, “Decentralized federated learning for electronic health records,” in 2020 54th Annual Conference on Information Sciences and Systems (CISS). IEEE, 2020, pp. 1–5.
- Z. Lian and C. Su, “Decentralized federated learning for internet of things anomaly detection,” in Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, 2022, pp. 1249–1251.
- F. Gerz, T. R. Bastürk, J. Kirchhoff, J. Denker, L. Al-Shrouf, and M. Jelali, “A comparative study and a new industrial platform for decentralized anomaly detection using machine learning algorithms,” in 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022, pp. 1–8.
- A. Koloskova, S. Stich, and M. Jaggi, “Decentralized stochastic optimization and gossip algorithms with compressed communication,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 3478–3487. [Online]. Available: https://proceedings.mlr.press/v97/koloskova19a.html
- L. Kong, T. Lin, A. Koloskova, M. Jaggi, and S. Stich, “Consensus control for decentralized deep learning,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18–24 Jul 2021, pp. 5686–5696. [Online]. Available: https://proceedings.mlr.press/v139/kong21a.html
- Y. Dandi, A. Koloskova, M. Jaggi, and S. U. Stich, “Data-heterogeneity-aware mixing for decentralized learning,” arXiv preprint arXiv:2204.06477, 2022.
- Z. Liu, A. Koloskova, M. Jaggi, and T. Lin, “Decentralized stochastic optimization with client sampling,” in OPT 2022: Optimization for Machine Learning (NeurIPS 2022 Workshop).
- P. Bellavista, L. Foschini, and A. Mora, “Decentralised learning in federated deployment environments: A system-level survey,” ACM Computing Surveys (CSUR), vol. 54, no. 1, pp. 1–38, 2021.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, A. Singh and J. Zhu, Eds., vol. 54. PMLR, 20–22 Apr 2017, pp. 1273–1282.
- F. Lai, Y. Dai, S. Singapuram, J. Liu, X. Zhu, H. Madhyastha, and M. Chowdhury, “Fedscale: Benchmarking model and system performance of federated learning at scale,” in International Conference on Machine Learning. PMLR, 2022, pp. 11 814–11 827.
- K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons, “The non-iid data quagmire of decentralized machine learning,” in International Conference on Machine Learning. PMLR, 2020, pp. 4387–4398.
- Z. Chen, W. Liao, P. Tian, Q. Wang, and W. Yu, “A fairness-aware peer-to-peer decentralized learning framework with heterogeneous devices,” Future Internet, vol. 14, no. 5, p. 138, 2022.
- A. M. Abdelmoniem, A. N. Sahu, M. Canini, and S. A. Fahmy, “Refl: Resource-efficient federated learning,” 2023.
- I. Hegedűs, G. Danner, and M. Jelasity, “Gossip learning as a decentralized alternative to federated learning,” in Distributed Applications and Interoperable Systems: 19th IFIP WG 6.1 International Conference, DAIS 2019, Held as Part of the 14th International Federated Conference on Distributed Computing Techniques, DisCoTec 2019, Kongens Lyngby, Denmark, June 17–21, 2019, Proceedings 19. Springer, 2019, pp. 74–90.
- T. Zhu, F. He, L. Zhang, Z. Niu, M. Song, and D. Tao, “Topology-aware generalization of decentralized sgd,” in International Conference on Machine Learning. PMLR, 2022, pp. 27 479–27 503.
- K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečnỳ, S. Mazzocchi, B. McMahan et al., “Towards federated learning at scale: System design,” Proceedings of machine learning and systems, vol. 1, pp. 374–388, 2019.
- P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and Open Problems in Federated Learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
- B. Ying, K. Yuan, Y. Chen, H. Hu, P. Pan, and W. Yin, “Exponential graph is provably efficient for decentralized deep training,” Advances in Neural Information Processing Systems, vol. 34, pp. 13 975–13 987, 2021.
- C. Yang, Q. Wang, M. Xu, Z. Chen, K. Bian, Y. Liu, and X. Liu, “Characterizing impacts of heterogeneity in federated learning upon large-scale smartphone data,” in Proceedings of the Web Conference 2021, 2021, pp. 935–946.
- A. Bellet, A.-M. Kermarrec, and E. Lavoie, “D-cliques: Compensating noniidness in decentralized federated learning with topology,” arXiv preprint arXiv:2104.07365, 2021.
- C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the presence of partial synchrony,” Journal of the ACM (JACM), vol. 35, no. 2, pp. 288–323, 1988.
- K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for privacy-preserving machine learning,” in proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 1175–1191.
- V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, and G. Srivastava, “A survey on security and privacy of federated learning,” Future Generation Computer Systems, vol. 115, pp. 619–640, 2021.
- F. Lai, X. Zhu, H. V. Madhyastha, and M. Chowdhury, “Oort: Efficient federated learning via guided participant selection,” in 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, Jul. 2021, pp. 19–35. [Online]. Available: https://www.usenix.org/conference/osdi21/presentation/lai
- A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.
- S. Caldas, S. M. K. Duddu, P. Wu, T. Li, J. Konečnỳ, H. B. McMahan, V. Smith, and A. Talwalkar, “Leaf: A benchmark for federated settings,” in 2nd Intl. Workshop on Federated Learning for Data Privacy and Confidentiality (FL-NeurIPS), 2019.
- Grouplens, “Movielens datasets,” 2021. [Online]. Available: https://grouplens.org/datasets/movielens/
- X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,” in International Conference on Learning Representations, 2019.
- T. Team, “Ipv8 networking library,” https://github.com/tribler/py-ipv8.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- A. Dhasade, A.-M. Kermarrec, R. Pires, R. Sharma, and M. Vujasinovic, “Decentralized learning made easy with decentralizepy,” in Proceedings of the 3rd Workshop on Machine Learning and Systems, ser. EuroMLSys ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 34–41. [Online]. Available: https://doi.org/10.1145/3578356.3592587
- WonderNetwork, “Global ping statistics,” https://wondernetwork.com/pings, accessed: 2022-05-12.
- A. Ignatov, R. Timofte, A. Kulik, S. Yang, K. Wang, F. Baum, M. Wu, L. Xu, and L. Van Gool, “Ai benchmark: All about deep learning on smartphones in 2019,” in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019, pp. 3617–3635.
- J. Huang, C. Chen, Y. Pei, Z. Wang, Z. Qian, F. Qian, B. Tiwana, Q. Xu, Z. Mao, M. Zhang et al., “Mobiperf: Mobile network measurement system,” Technical Report. University of Michigan and Microsoft Research, 2011.
- Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, aug 2009. [Online]. Available: https://doi.org/10.1109/MC.2009.263
- W. Wu, L. He, W. Lin, R. Mao, C. Maple, and S. Jarvis, “Safa: A semi-asynchronous protocol for fast federated learning with low overhead,” IEEE Transactions on Computers, vol. 70, no. 5, pp. 655–668, 2020.
- G. Damaskinos, R. Guerraoui, A.-M. Kermarrec, V. Nitu, R. Patra, and F. Taiani, “Fleet: Online federated learning via staleness awareness and performance prediction,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 5, pp. 1–30, 2022.
- T. Vogels, H. Hendrikx, and M. Jaggi, “Beyond spectral gap: the role of the topology in decentralized learning,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=AQgmyyEWg8
- Z. Song, W. Li, K. Jin, L. Shi, M. Yan, W. Yin, and K. Yuan, “Communication-efficient topologies for decentralized learning with $o(1)$ consensus rate,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=AyiiHcRzTd
- M. Ryabinin, E. Gorbunov, V. Plokhotnyuk, and G. Pekhimenko, “Moshpit sgd: Communication-efficient decentralized training on heterogeneous unreliable devices,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- J. Cao, Z. Lian, W. Liu, Z. Zhu, and C. Ji, “Hadfl: heterogeneity-aware decentralized federated learning framework,” in 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 2021, pp. 1–6.
- D. Huba, J. Nguyen, K. Malik, R. Zhu, M. Rabbat, A. Yousefpour, C.-J. Wu, H. Zhan, P. Ustinov, H. Srinivas, K. Wang, A. Shoumikhin, J. Min, and M. Malek, “Papaya: Practical, private, and scalable federated learning,” in Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu, Eds., vol. 4, 2022, pp. 814–832. [Online]. Available: https://proceedings.mlsys.org/paper/2022/file/f340f1b1f65b6df5b5e3f94d95b11daf-Paper.pdf
- J. Nguyen, K. Malik, H. Zhan, A. Yousefpour, M. Rabbat, M. Malek, and D. Huba, “Federated learning with buffered asynchronous aggregation,” in Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, G. Camps-Valls, F. J. R. Ruiz, and I. Valera, Eds., vol. 151. PMLR, 28–30 Mar 2022, pp. 3581–3607. [Online]. Available: https://proceedings.mlr.press/v151/nguyen22b.html
- Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain and federated learning for privacy-preserved data sharing in industrial iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 6, pp. 4177–4186, 2019.
- S. R. Pokhrel and J. Choi, “Federated learning with blockchain for autonomous vehicles: Analysis and design challenges,” IEEE Transactions on Communications, vol. 68, no. 8, pp. 4734–4746, 2020.
- U. Majeed and C. S. Hong, “Flchain: Federated learning via mec-enabled blockchain network,” in 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS). IEEE, 2019, pp. 1–4.
- X. Bao, C. Su, Y. Xiong, W. Huang, and Y. Hu, “Flchain: A blockchain for auditable federated learning with trust and incentive,” in 2019 5th International Conference on Big Data Computing and Communications (BIGCOM). IEEE, 2019, pp. 151–159.
- X. Lian, C. Zhang, H. Zhang, C.-J. Hsieh, W. Zhang, and J. Liu, “Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent,” in NIPS, 2017.
- E. Bortnikov, M. Gurevich, I. Keidar, G. Kliot, and A. Shraer, “Brahms: Byzantine resilient random membership sampling,” Computer Networks, vol. 53, no. 13, pp. 2340–2359, 2009.
- A. Auvolat, Y.-D. Bromberg, D. Frey, and F. Taïani, “Basalt: A rock-solid foundation for epidemic consensus algorithms in very large, very open networks,” arXiv preprint arXiv:2102.04063, 2021.
- M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, and M. Van Steen, “Gossip-based peer sampling,” ACM Transactions on Computer Systems (TOCS), vol. 25, no. 3, pp. 8–es, 2007.