Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Toward efficient resource utilization at edge nodes in federated learning (2309.10367v2)

Published 19 Sep 2023 in cs.LG and cs.AI

Abstract: Federated learning (FL) enables edge nodes to collaboratively contribute to constructing a global model without sharing their data. This is accomplished by devices computing local, private model updates that are then aggregated by a server. However, computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. Edge nodes tend to have limited hardware resources (RAM, CPU), and the network bandwidth and reliability at the edge is a concern for scaling federated fleet applications. In this paper, we propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices, as well as the load on the server and network in each global training round. For each local model update, we randomly select layers to train, freezing the remaining part of the model. In doing so, we can reduce both server load and communication costs per round by excluding all untrained layer weights from being transferred to the server. The goal of this study is to empirically explore the potential trade-off between resource utilization on devices and global model convergence under the proposed strategy. We implement the approach using the federated learning framework FEDn. A number of experiments were carried out over different datasets (CIFAR-10, CASA, and IMDB), performing different tasks using different deep-learning model architectures. Our results show that training the model partially can accelerate the training process, efficiently utilizes resources on-device, and reduce the data transmission by around 75% and 53% when we train 25%, and 50% of the model layers, respectively, without harming the resulting global model accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273–1282, 2017.
  2. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527, 2016a.
  3. Communication-efficient federated learning with adaptive parameter freezing. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), pages 1–11. IEEE, 2021.
  4. Gradient coding: Avoiding stragglers in distributed learning. In International Conference on Machine Learning, pages 3368–3376. PMLR, 2017.
  5. Elfish: Resource-aware federated learning on heterogeneous edge devices. Ratio, 2(r1):r2, 2019.
  6. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020.
  7. A comparative study of fine-tuning deep learning models for plant disease identification. Computers and Electronics in Agriculture, 161:272–279, 2019.
  8. Autofreeze: Automatically freezing model blocks to accelerate fine-tuning. arXiv preprint arXiv:2102.01386, 2021a.
  9. Efficient and robust parallel dnn training through model parallelism on multi-gpu platform. arXiv preprint arXiv:1809.02839, 2018.
  10. Mariana: Tencent deep learning platform and its applications. Proceedings of the VLDB Endowment, 7(13):1772–1777, 2014.
  11. Distributed tensorflow with mpi. arXiv preprint arXiv:1603.02339, 2016.
  12. A linear algebraic approach to model parallelism in deep learning. arXiv preprint arXiv:2006.03108, 2020.
  13. Beyond data and model parallelism for deep neural networks. arXiv preprint arXiv:1807.05358, 2018.
  14. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019.
  15. Fast deep learning training through intelligently freezing layers. In 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pages 1225–1232. IEEE, 2019a.
  16. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
  17. Large scale distributed deep networks. Advances in neural information processing systems, 25, 2012.
  18. Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799, 2018.
  19. {{\{{HetPipe}}\}}: Enabling large {{\{{DNN}}\}} training on (whimpy) heterogeneous {{\{{GPU}}\}} clusters through integration of pipelined model parallelism and data parallelism. In 2020 USENIX Annual Technical Conference (USENIX ATC 20), pages 307–321, 2020.
  20. Ampnet: Asynchronous model-parallel training for dynamic neural networks. arXiv preprint arXiv:1705.09786, 2017.
  21. Leslie G Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990.
  22. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  23. Tensorflow benchmarks., 2017. URL https://www.tensorflow.org/performance/benchmarks.
  24. URL https://caffe2.ai/.
  25. Tensors and dynamic neural networks in python with strong gpu acceleration., 2017. URL https://pytorch.org.
  26. Freezeout: Accelerate training by progressively freezing layers. arXiv preprint arXiv:1706.04983, 2017.
  27. Notice of violation of ieee publication principles: Efficient technique to accelerate neural network training by freezing hidden layers. In 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), pages 542–546. IEEE, 2019.
  28. Fast deep learning training through intelligently freezing layers. In 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pages 1225–1232. IEEE, 2019b.
  29. What would elsa do? freezing layers during transformer fine-tuning. arXiv preprint arXiv:1911.03090, 2019.
  30. Autofreeze: Automatically freezing model blocks to accelerate fine-tuning. arXiv preprint arXiv:2102.01386, 2021b.
  31. Efficient dnn training with knowledge-guided layer freezing. arXiv preprint arXiv:2201.06227, 2022.
  32. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016b.
  33. On the convergence of federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127, 3:3, 2018.
  34. Scalable federated machine learning with fedn. arXiv preprint arXiv:2103.00148, 2021.
  35. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–19, 2019.
  36. Spottune: transfer learning through adaptive fine-tuning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4805–4814, 2019.
  37. Transfer learning with adaptive fine-tuning. IEEE Access, 8:196197–196211, 2020.
  38. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600, 2018.
  39. Mesh-tensorflow: Deep learning for supercomputers. Advances in neural information processing systems, 31, 2018.
  40. Optimization of collective communication operations in mpich. The International Journal of High Performance Computing Applications, 19(1):49–66, 2005.
  41. Alex Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014.
  42. Learning multiple layers of features from tiny images. 2009.
  43. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-1015.
  44. Snic science cloud (ssc): A national-scale cloud infrastructure for swedish academia. In 2017 IEEE 13th International Conference on e-Science (e-Science), pages 219–227, 2017. doi:10.1109/eScience.2017.35.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sadi Alawadi (8 papers)
  2. Addi Ait-Mlouk (4 papers)
  3. Salman Toor (23 papers)
  4. Andreas Hellander (26 papers)
Citations (5)