Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey of Distributed Learning in Cloud, Mobile, and Edge Settings (2405.15079v1)

Published 23 May 2024 in cs.LG

Abstract: In the era of deep learning (DL), convolutional neural networks (CNNs), and LLMs, ML models are becoming increasingly complex, demanding significant computational resources for both inference and training stages. To address this challenge, distributed learning has emerged as a crucial approach, employing parallelization across various devices and environments. This survey explores the landscape of distributed learning, encompassing cloud and edge settings. We delve into the core concepts of data and model parallelism, examining how models are partitioned across different dimensions and layers to optimize resource utilization and performance. We analyze various partitioning schemes for different layer types, including fully connected, convolutional, and recurrent layers, highlighting the trade-offs between computational efficiency, communication overhead, and memory constraints. This survey provides valuable insights for future research and development in this rapidly evolving field by comparing and contrasting distributed learning approaches across diverse contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. P. Joshi, M. Hasanuzzaman, C. Thapa, H. Afli, and T. Scully, “Enabling all in-edge deep learning: A literature review,” IEEE Access, vol. 11, pp. 3431–3460, 2023.
  2. T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
  3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012.
  4. Z. Jia, M. Zaharia, and A. Aiken, “Beyond data and model parallelism for deep neural networks,” arXiv preprint arXiv:1807.05358, 2018.
  5. M. Li, “Scaling distributed machine learning with the parameter server,” in Proceedings of the International Conference on Big Data Science and Computing, 2014.
  6. J. K. Kim, Q. Ho, S. Lee, X. Zheng, W. Dai, G. A. Gibson, and E. P. Xing, “STRADS: A distributed framework for scheduled model parallel machine learning,” in Proceedings of the Eleventh European Conference on Computer Systems, 2016.
  7. S. Teerapittayanon, B. McDanel, and H. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” in International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 328–339.
  8. T. Mohammed, C. Joe-Wong, R. Babbar, and M. D. Francesco, “Distributed inference acceleration with adaptive dnn partitioning and offloading,” in IEEE Conference on Computer Communications (INFOCOM), 2020, pp. 854–863.
  9. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, 2017, pp. 1273–1282.
  10. S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “When edge meets learning: Adaptive control for resource-constrained distributed machine learning,” in IEEE Conference on Computer Communications (INFOCOM), 2018, pp. 63–71.
  11. H. B. McMahan et al., “Advances and open problems in federated learning,” Foundations and Trends in Machine Learning, vol. 14, no. 1, 2021.
  12. Z. Wang, H. Xu, Y. Xu, Z. Jiang, and J. Liu, “CoopFL: Accelerating federated learning with DNN partitioning and offloading in heterogeneous edge computing,” Comput. Netw., vol. 220, no. C, Jan. 2023.
  13. R. Hadidi, J. Cao, M. S. Ryoo, and H. Kim, “Toward collaborative inferencing of deep neural networks on internet-of-things devices,” IEEE Internet of Things Journal, vol. 7, no. 6, pp. 4950–4960, 2020.
  14. L. Zhou, M. H. Samavatian, A. Bacha, S. Majumdar, and R. Teodorescu, “Adaptive parallel execution of deep neural networks on heterogeneous edge devices,” in Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, 2019, pp. 195–208.
  15. F. Xue, W. Fang, W. Xu, Q. Wang, X. Ma, and Y. Ding, “EdgeLD: Locally distributed deep learning inference on edge device clusters,” in International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2020, pp. 613–619.
  16. Z. Huai, B. Ding, H. Wang, M. Geng, and L. Zhang, “Towards deep learning on resource-constrained robots: A crowdsourcing approach with model partition,” in 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2019, pp. 989–994.
  17. E. Tang and T. Stefanov, “Low-memory and high-performance CNN inference on distributed systems at the edge,” in Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC), 2021.
  18. J. Zhou, Y. Wang, K. Ota, and M. Dong, “AAIoT: Accelerating artificial intelligence in IoT systems,” IEEE Wireless Communications Letters, vol. 8, no. 3, pp. 825–828, 2019.
  19. R. Hadidi, J. Cao, M. Woodward, M. S. Ryoo, and H. Kim, “Distributed perception by collaborative robots,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3709–3716, 2018.
  20. X.-W. Chen and X. Lin, “Big data deep learning: Challenges and perspectives,” IEEE Access, vol. 2, pp. 514–525, 2014.
  21. R. Stahl, Z. Zhao, D. Mueller-Gritschneder, A. Gerstlauer, and U. Schlichtmann, “Fully distributed deep learning inference on resource-constrained edge devices,” in nternational Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), 2019, pp. 77–90.
  22. J. Mao, X. Chen, K. W. Nixon, C. Krieger, and Y. Chen, “MoDNN: Local distributed mobile computing system for deep neural network,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, pp. 1396–1401.
  23. B. Yuan, C. R. Wolfe, C. Dun, Y. Tang, A. Kyrillidis, and C. Jermaine, “Distributed learning of fully connected neural networks using independent subnet training,” Proceedings of the VLDB Endowment, vol. 15, no. 8, 2022.
  24. Z. Zhao, K. M. Barijough, and A. Gerstlauer, “DeepThings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2348–2359, 2018.
  25. L. Zeng, X. Chen, Xu Chen, Xu Chen, Z. Zhou, L. Yang, J. Zhang, and J. Zhang, “CoEdge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE ACM Transactions on Networking, vol. 29, no. 2, pp. 595–608, 2020.
  26. Z. Gao, S. Sun, Y. Zhang, Z. Mo, and C. Zhao, “EdgeSP: Scalable multi-device parallel dnn inference on heterogeneous edge clusters,” in International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), 2021, p. 317–333.
  27. X. Wang, Z. Yang, J. Wu, Y. Zhao, and Z. Zhou, “EdgeDuet: Tiling small object detection for edge assisted autonomous mobile vision,” in IEEE Conference on Computer Communications (INFOCOM), 2021, pp. 1–10.
  28. S. Dey, A. Mukherjee, A. Pal, and P. Balamuralidhar, “Partitioning of CNN models for execution on fog devices,” in Proceedings of the 1st ACM International Workshop on Smart Cities and Fog Computing, Shenzhen China, Nov. 2018, pp. 19–24.
  29. R. Stahl, A. Hoffman, D. Mueller-Gritschneder, A. Gerstlauer, and U. Schlichtmann, “DeeperThings: Fully distributed CNN inference on resource-constrained edge devices,” International Journal of Parallel Programming, vol. 49, no. 4, pp. 600–624, 2021.
  30. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1251–1258.
  31. N. Dryden, N. Maruyama, T. Moon, T. Benson, M. Snir, and B. Van Essen, “Channel and filter parallelism for large-scale CNN training,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver Colorado, Nov. 2019, pp. 1–20.
  32. M. Alwani, H. Chen, M. Ferdman, and P. Milder, “Fused-layer CNN accelerators,” in 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.
  33. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), no. arXiv:1409.1556.   arXiv, Apr. 2015.
  34. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  35. A. X. M. Chang, B. Martini, and E. Culurciello, “Recurrent neural networks hardware implementation on FPGA,” arXiv preprint arXiv:1511.05552, 2015.
  36. E. Nurvitadhi, J. Sim, D. Sheffield, A. Mishra, S. Krishnan, and D. Marr, “Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC,” in 26th International Conference on Field Programmable Logic and Applications (FPL), 2016.
  37. O. Kuchaiev and B. Ginsburg, “Factorization tricks for LSTM networks,” 2018, number: arXiv:1703.10722. [Online]. Available: http://arxiv.org/abs/1703.10722
  38. S. Cao, C. Zhang, Z. Yao, W. Xiao, L. Nie, D. Zhan, Y. Liu, M. Wu, and L. Zhang, “Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity,” in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), New York, NY, 2019, p. 63–72.
  39. D. Kwon, S. Hur, H. Jang, E. Nurvitadhi, and J. Kim, “Scalable multi-FPGA acceleration for large RNNs with full parallelism levels,” in ACM/IEEE Design Automation Conference (DAC), 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Madison Threadgill (2 papers)
  2. Andreas Gerstlauer (9 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets