Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates (2403.18375v1)
Abstract: Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.
- E. Horvitz and D. Mulligan, “Data, privacy, and the greater good,” Science, vol. 349, no. 6245, pp. 253–255, 2015.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
- P. Kairouz et al., “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
- T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, 2020.
- T. Gafni, N. Shlezinger, K. Cohen, Y. C. Eldar, and H. V. Poor, “Federated learning: A signal processing perspective,” IEEE Signal Process. Mag., vol. 39, no. 3, pp. 14–41, 2022.
- J. Chen and X. Ran, “Deep learning with edge computing: A review.” Proc. IEEE, vol. 107, no. 8, pp. 1655–1674, 2019.
- S. Samarakoon, M. Bennis, W. Saad, and M. Debbah, “Distributed federated learning for ultra-reliable low-latency vehicular communications,” IEEE Trans. Commun., vol. 68, no. 2, pp. 1146–1159, 2019.
- W. Xia, W. Wen, K.-K. Wong, T. Q. Quek, J. Zhang, and H. Zhu, “Federated-learning-based client scheduling for low-latency wireless communications,” IEEE Wireless Commun. Lett., vol. 28, no. 2, pp. 32–38, 2021.
- Y. Lu, X. Huang, K. Zhang, S. Maharjan, and Y. Zhang, “Low-latency federated learning and blockchain for edge association in digital twin empowered 6G networks,” IEEE Trans. Ind. Informat., vol. 17, no. 7, pp. 5098–5107, 2020.
- W. Shi, S. Zhou, Z. Niu, M. Jiang, and L. Geng, “Joint device scheduling and resource allocation for latency constrained wireless federated learning,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 453–467, 2020.
- P. Han, S. Wang, and K. K. Leung, “Adaptive gradient sparsification for efficient federated learning: An online learning approach,” in IEEE International Conference on Distributed Computing Systems (ICDCS), 2020, pp. 300–310.
- C. Hardy, E. Le Merrer, and B. Sericola, “Distributed deep learning on edge-devices in the parameter server model,” in Workshop on Decentralized Machine Learning, Optimization and Privacy, 2017.
- A. F. Aji and K. Heafield, “Sparse communication for distributed gradient descent,” arXiv preprint arXiv:1704.05021, 2017.
- D. Alistarh et al., “The convergence of sparsified gradient methods,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- N. Shlezinger, M. Chen, Y. C. Eldar, H. V. Poor, and S. Cui, “UVeQFed: Universal vector quantization for federated learning,” IEEE Trans. Signal Process., vol. 69, pp. 500–514, 2020.
- D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: Communication-efficient SGD via gradient quantization and encoding,” Advances in Neural Information Processing Systems, vol. 30, pp. 1709–1720, 2017.
- A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie, and R. Pedarsani, “Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 2021–2031.
- N. Lang, N. Shlezinger, R. G. D’Oliveira, and S. E. Rouayheb, “Compressed private aggregation for scalable and robust federated learning over massive networks,” arXiv preprint arXiv:2308.00540, 2023.
- E. Diao, J. Ding, and V. Tarokh, “Heterofl: Computation and communication efficient federated learning for heterogeneous clients,” in International Conference on Learning Representations, 2020.
- K. Pfeiffer, M. Rapp, R. Khalili, and J. Henkel, “Federated learning for computationally-constrained heterogeneous devices: A survey,” ACM Computing Surveys, 2023.
- S. Vahidian, S. Kadaveru, W. Baek, W. Wang, V. Kungurtsev, C. Chen, M. Shah, and B. Lin, “When do curricula work in federated learning?” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 5084–5094.
- T. Nishio and R. Yonetani, “Client selection for federated learning with heterogeneous resources in mobile edge,” in IEEE international conference on communications (ICC), 2019.
- A. Reisizadeh, I. Tziotis, H. Hassani, A. Mokhtari, and R. Pedarsani, “Straggler-resilient federated learning: Leveraging the interplay between statistical accuracy and system heterogeneity,” IEEE J. Sel. Areas Inf. Theory, vol. 3, no. 2, pp. 197–205, 2022.
- R. Schlegel, S. Kumar, E. Rosnes, and A. G. i Amat, “CodedPaddedFL and CodedSecAgg: Straggler mitigation and secure aggregation in federated learning,” IEEE Trans. Commun., 2023.
- I. Wang, P. J. Nair, and D. Mahajan, “Fluid: Mitigating stragglers in federated learning using invariant dropout,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Machine Learning and Systems (MLSys), vol. 2, pp. 429–450, 2020.
- J. Park, D.-J. Han, M. Choi, and J. Moon, “Sageflow: Robust federated learning against both stragglers and adversaries,” Advances in neural information processing systems, vol. 34, pp. 840–851, 2021.
- Y. Chen, Y. Ning, M. Slawski, and H. Rangwala, “Asynchronous online federated learning for edge devices with non-iid data,” in IEEE International Conference on Big Data, 2020, pp. 15–24.
- K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečnỳ, S. Mazzocchi, B. McMahan et al., “Towards federated learning at scale: System design,” Machine Learning and Systems (MLSys), vol. 1, pp. 374–388, 2019.
- X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,” in International Conference on Learning Representations, 2019.
- M. Chen, N. Shlezinger, H. V. Poor, Y. C. Eldar, and S. Cui, “Communication-efficient federated learning,” Proceedings of the National Academy of Sciences, vol. 118, no. 17, 2021.
- H. Esfahanizadeh, A. Cohen, and M. Médard, “Stream iterative distributed coded computing for learning applications in heterogeneous systems,” in IEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 2022, pp. 230–239.
- J. Nguyen, K. Malik, H. Zhan, A. Yousefpour, M. Rabbat, M. Malek, and D. Huba, “Federated learning with buffered asynchronous aggregation,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 3581–3607.
- T. Zhang, L. Gao, S. Lee, M. Zhang, and S. Avestimehr, “Timelyfl: Heterogeneity-aware asynchronous federated learning with adaptive partial training,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5063–5072.
- T. Ortega and H. Jafarkhani, “Asynchronous federated learning with bidirectional quantized communications and buffered aggregation,” in International Conference on Machine Learning (ICML), Workshop on Federated Learning and Analytics, 2023.
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation, parallel distributed processing, explorations in the microstructure of cognition, ed. de rumelhart and j. mcclelland. vol. 1. 1986,” Biometrika, vol. 71, pp. 599–607, 1986.
- S. Zhang, J. Li, L. Shi, M. Ding, D. C. Nguyen, W. Tan, J. Weng, and Z. Han, “Federated learning in intelligent transportation systems: Recent applications and open problems,” IEEE Transactions on Intelligent Transportation Systems, 2023.
- S. U. Stich, “Local SGD converges fast and communicates little,” in International Conference on Learning Representations, 2018.
- N. Lang, E. Sofer, T. Shaked, and N. Shlezinger, “Joint privacy enhancement and quantization in federated learning,” IEEE Trans. Signal Process., vol. 71, pp. 295–310, 2023.
- X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
- S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
- P. Chang, G. Duràn-Martín, A. Y. Shestopaloff, M. Jones, and K. Murphy, “Low-rank extended Kalman filtering for online learning of neural networks from streaming data,” arXiv preprint arXiv:2305.19535, 2023.
- S. Liu, P.-Y. Chen, B. Kailkhura, G. Zhang, A. O. Hero III, and P. K. Varshney, “A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications,” IEEE Signal Process. Mag., vol. 37, no. 5, pp. 43–54, 2020.
- L. Deng, “The MNIST database of handwritten digit images for machine learning research,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 141–142, 2012.
- A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research).” [Online]. Available: http://www.cs.toronto.edu/~kriz/cifar.html
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- G. An, “The effects of adding noise during backpropagation training on a generalization performance,” Neural computation, vol. 8, no. 3, pp. 643–674, 1996.
- T. Sery, N. Shlezinger, K. Cohen, and Y. C. Eldar, “Over-the-air federated learning from heterogeneous data,” IEEE Trans. Signal Process., vol. 69, pp. 3796–3811, 2021.