Dependable Distributed Training of Compressed Machine Learning Models
Abstract: The existing work on the distributed training of ML models has consistently overlooked the distribution of the achieved learning quality, focusing instead on its average value. This leads to a poor dependability}of the resulting ML models, whose performance may be much worse than expected. We fill this gap by proposing DepL, a framework for dependable learning orchestration, able to make high-quality, efficient decisions on (i) the data to leverage for learning, (ii) the models to use and when to switch among them, and (iii) the clusters of nodes, and the resources thereof, to exploit. For concreteness, we consider as possible available models a full DNN and its compressed versions. Unlike previous studies, DepL guarantees that a target learning quality is reached with a target probability, while keeping the training cost at a minimum. We prove that DepL has constant competitive ratio and polynomial complexity, and show that it outperforms the state-of-the-art by over 27% and closely matches the optimum.
- A. M. Abdelmoniem, A. N. Sahu, M. Canini, and S. A. Fahmy, “Resource-Efficient Federated Learning,” arXiv preprint arXiv:2111.01108, 2021.
- S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resource constrained edge computing systems,” IEEE JSAC, 2019.
- H. Wu and P. Wang, “Fast-convergent federated learning with adaptive weighting,” IEEE TCCN, 2021.
- F. Malandrino, G. Di Giacomo, A. Karamzade, M. Levorato, and C. Chiasserini, “Matching DNN compression and cooperative training with resources and data availability,” in IEEE INFOCOM, 2023.
- A. N. Angelopoulos, S. Bates, A. Fisch, L. Lei, and T. Schuster, “Conformal risk control,” arXiv preprint arXiv:2208.02814, 2022.
- M. Zecchin, S. Park, and O. Simeone, “Forking uncertainties: Reliable prediction and model predictive control with sequence models via conformal risk control,” arXiv preprint arXiv:2310.10299, 2023.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, “What is the state of neural network pruning?” Proceedings of machine learning and systems, vol. 2, pp. 129–146, 2020.
- L. Liebenwein, C. Baykal, B. Carter, D. Gifford, and D. Rus, “Lost in pruning: The effects of pruning neural networks beyond test accuracy,” Proceedings of Machine Learning and Systems, vol. 3, pp. 93–138, 2021.
- J. Konečný, B. McMahan, and D. Ramage, “Federated optimization: Distributed optimization beyond the datacenter,” arXiv preprint arXiv:1511.03575, 2015.
- A. Li, L. Zhang, J. Tan, Y. Qin, J. Wang, and X.-Y. Li, “Sample-level data selection for federated learning,” in IEEE INFOCOM, 2021.
- H. Wang, Z. Kaplan, D. Niu, and B. Li, “Optimizing federated learning on non-iid data with reinforcement learning,” in IEEE INFOCOM, 2020.
- G. Sallam and B. Ji, “Joint placement and allocation of virtual network functions with budget and capacity constraints,” in IEEE INFOCOM. IEEE, 2019.
- D. Harris and D. Raz, “Dynamic vnf placement in 5g edge nodes,” in IEEE NetSoft, 2022.
- T. Linjordet and K. Balog, “Impact of training dataset size on neural answer selection models,” in ECIR, 2019.
- C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in IEEE ICCV, 2017.
- L. Lovász, “Submodular functions and convexity,” Mathematical Programming The State of the Art: Bonn 1982, pp. 235–257, 1983.
- T. Itoko et al., “Computational geometric approach to submodular function minimization for multiclass queueing systems,” in IPCO, 2007.
- J. Martín-Pérez et al., “KPI guarantees in network slicing,” IEEE/ACM Trans. on Networking, 2021.
- G. Xue et al., “Finding a path subject to many additive qos constraints,” IEEE/ACM Trans. on networking, 2007.
- H. Feng et al., “Approximation algorithms for the nfv service distribution problem,” in IEEE INFOCOM, 2017.
- H. Kellerer, U. Pferschy, D. Pisinger, H. Kellerer, U. Pferschy, and D. Pisinger, “Introduction to np-completeness of knapsack problems,” Knapsack problems, 2004.
- X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-iid data,” in ICLR, 2020.
- O. Marfoq, G. Neglia, R. Vidal, and L. Kameni, “Personalized federated learning through local memorization,” in ICML, 2022.
- Y. Zhou, Q. Ye, and J. C. Lv, “Communication-Efficient Federated Learning with Compensated Overlap-FedAvg,” IEEE Transactions on Parallel and Distributed Systems, 2021.
- A. Imteaj et al., “Fedar: Activity and resource-aware federated learning model for distributed mobile robots,” in IEEE ICMLA, 2020.
- F. Malandrino, C. F. Chiasserini, and G. Di Giacomo, “Efficient distributed dnns in the mobile-edge-cloud continuum,” IEEE/ACM Transactions on Networking, 2023.
- F. Paissan et al., “Scalable neural architectures for end-to-end environmental sound classification,” in IEEE ICASSP, 2022.
- C. M. J. Tan and M. Motani, “Dropnet: Reducing neural network complexity via iterative pruning,” in ICML, 2020.
- J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision, 2021.
- Y. Huang and Y. Yu, “Distilling deep neural networks with reinforcement learning,” in IEEE ICIA, 2018.
- T. Li, S. Hu, A. Beirami, and V. Smith, “Ditto: Fair and robust federated learning through personalization,” in ICML, 2021.
- F. Ang, L. Chen, N. Zhao, Y. Chen, W. Wang, and F. R. Yu, “Robust federated learning with noisy communication,” IEEE Transactions on Communications, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.