Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm (2404.00836v1)

Published 1 Apr 2024 in cs.IT, cs.DC, cs.LG, and math.IT

Abstract: In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning unifying pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters such as the number of learning rounds and batch sizes in the two stages on the convergence rate. Based on our analytical results, we then propose a joint communication and computation resource management design to minimize an average squared gradient norm bound, subject to constraints on the transmit power, overall system energy consumption, and training delay. The decision variables include the number of learning rounds, batch sizes, clock frequencies, and transmit power control for both pre-training and fine-tuning stages. Finally, numerical results are provided to evaluate the effectiveness of our proposed design. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off among the training accuracy, delay, and energy consumption. The proposed design is also shown to effectively leverage the inherent trade-off between pre-training and fine-tuning, which arises from the differences in data distribution between pre-stored general data versus real-time task-specific data, thus efficiently optimizing overall system performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 604-624, Feb. 2021.
  2. Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep learning for visual understanding: A review,” Neurocomput., vol. 187, pp. 27-48, Apr. 2016.
  3. OpenAI, “GPT-4 Technical Report,” 2023. [Online]. Available: https://arxiv.org/pdf/2303.08774.pdf
  4. G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward an intelligent edge: Wireless communication meets machine learning,” IEEE Commun. Mag., vol. 58, no. 1, pp. 19-25, Jan. 2020.
  5. K. Zhao, Z. Zhou, X. Chen, R. Zhou, X. Zhang, S. Yu, and D. Wu, “EdgeAdaptor: Online configuration adaption, model selection and resource provisioning for edge DNN inference serving at scale,” IEEE Trans. Mob. Comput., vol. 22, no. 10, pp. 5870-5886, 1 Oct. 2023.
  6. G. Zhu, Z. Lyu, X. Jiao, P. Liu, M. Chen, J. Xu, S. Cui, and P. Zhang, “Pushing AI to wireless network edge: An overview on integrated sensing, communication, and computation towards 6G,” Sci. China Inf. Sci., vol. 66, no. 130301, pp. 1-19, Feb. 2023.
  7. Z. Zhou, Q. Wu, and X. Chen, “Online orchestration of cross-Edge service function chaining for cost-efficient edge computing,” IEEE J. Sel. Areas Commun., vol. 37, no. 8, pp. 1866-1880, Aug. 2019.
  8. X. Cao, Z. Lyu, G. Zhu, J. Xu, L. Xu, and S. Cui, “An overview on over-the-air federated edge learning”, IEEE Wireless Commun., early access.
  9. L. You, S. Liu, B. Zuo, C. Yuen, D. Niyato, and H. V. Poor, “Federated and asynchronized learning for autonomous and intelligent things,” IEEE Netw., early access.
  10. D. Liu, G. Zhu, J. Zhang, and K. Huang, “Data-importance aware user scheduling for communication-efficient edge machine learning,” IEEE Trans. Cogn. Commun. Netw., vol. 7, no. 1, pp. 265-278, Mar. 2021.
  11. Y. Yang, Z. Zhang, and Q. Yang, “Communication-efficient federated learning with binary neural networks,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3836-3850, Dec. 2021.
  12. Y. Jiang, S. Wang, V. Valls, B. J. Ko, W. H. Lee, K. K. Leung, and L. Tassiulas, “Model pruning enables efficient federated learning on edge devices,” IEEE Trans. Neural Netw. Learn. Syst., early access.
  13. Y. Shi, S. Xia, Y. Zhou, Y. Mao, C. Jiang, and M. Tao, “Vertical federated learning over cloud-RAN: Convergence analysis and system optimization,” 2023. [Online]. Available: https://arxiv.org/pdf/2305.06279.pdf
  14. H. Zhao, Y. Tan, K. Guo, W. Xia, B. Xu, and T. Q. S. Quek, “Client scheduling for multi-server federated learning in industrial IoT with unreliable communications,” IEEE Internet Things J., early access.
  15. Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. S. Bahaei, “Energy efficient federated learning over wireless communication networks,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 3579-3605, Mar. 2021.
  16. X. Han, J. Li, W. Chen, Z. Mei, K. Wei, M. Ding, and H. V. Poor, “Analysis and optimization of wireless federated learning with data heterogeneity,” IEEE Trans. Wireless Commun., early access.
  17. Z. Chen, W. Yi, A. Nallanathan, and G. Y. Li, “Federated learning for energy-limited wireless networks: A partial model aggregation approach,” 2022. [Online]. Available: https://arxiv.org/pdf/2204.09746
  18. Y. Sun, S. Zhou, Z. Niu, and D. Gündüz, “Dynamic scheduling for over-the-Air federated edge learning with energy constraints,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 227-242, Jan. 2022.
  19. Y. Li, X. Qin, K. Han, N. Ma, X. Xu, and P. Zhang, “Accelerating wireless federated learning with adaptive scheduling over heterogeneous devices,” IEEE Internet Things J., vol. 11, no. 2, pp. 2286-2302, Jan.15, 2024.
  20. Y. Li, Y. Cui, and V. Lau, “GQFedWAvg: Optimization-based quantized federated learning in general edge computing systems,” 2023. [Online]. Available: https://arxiv.org/pdf/2306.07497.pdf
  21. X. Mo and J. Xu, “Energy-efficient federated edge learning with joint communication and computation design,” J. Commun. Inf. Netw., vol. 6, no. 2, pp. 110-124, Jun. 2021.
  22. M. S. Al-Abiad, M. Obeed, M. J. Hossain, and A. Chaaban, “Decentralized aggregation for energy-efficient federated learning via D2D communications,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3333-3351, Jun. 2023.
  23. S. Luo, X. Chen, Q. Wu, Z. Zhou, and S. Yu, “HFEL: Joint edge association and resource allocation for cost-efficient hierarchical federated edge learning,” IEEE Trans. Wireless Commun., vol. 19, no. 10, pp. 6535-6548, Oct. 2020.
  24. L. Li, D. Shi, R. Hou, H. Li, M. Pan, and Z. Han, “To talk or to work: Flexible communication compression for energy efficient federated learning over heterogeneous mobile edge devices,” in Proc. IEEE INFOCOM, pp. 1-10, 2021.
  25. Z. Chen, W. Yi, Y. Liu, and A. Nallanathan, “Knowledge-aided federated learning for energy-limited wireless networks,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3368-3386, Jun. 2023.
  26. L. Bariah, Q. Zhao, H. Zou, Y. Tian, F. Bader, and M. Debbah, “Large generative AI models for telecom: The next big thing?,” IEEE Commun. Mag., early access.
  27. Y. Shen, J. Shao, X. Zhang, Z. Lin, H. Pan, D. Li, J. Zhang, and K. B. Letaief, “Large language models empowered autonomous edge AI for connected intelligence,” IEEE Commun. Mag., early access.
  28. D. Narayanan, M. Shoeybi, J. Casper, P. LeGresley, M. Patwary, V. Korthikanti, D. Vainbrand, P. Kashinkunti, J. Bernauer, B. Catanzaro, A. Phanishayee, and M. Zaharia, “Efficient large-scale language model training on GPU clusters using megatron-lm,” in Proc. the International Conference for High Performance Computing, Networking, Storage, and Analysis, pp. 1-15, 2021.
  29. A. S. Luccioni, S. Viguier, and A. L. Ligozat, “Estimating the carbon footprint of bloom, a 176B parameter language model,” 2022. [Online]. Available: https://arxiv.org/abs/2211.02001
  30. A. Hilmkil, S. Callh, M. Barbieri, L. R. Sütfeld, E. L. Zec, and O. Mogren, “Scaling federated learning for fine-tuning of large language models”, in Proc. NLDB, pp. 15-23, 2021.
  31. S. Savazzi, V. Rampa, S. Kianoush, and M. Bennis, “On the energy and communication efficiency tradeoffs in federated and multi-task learning”, in Proc. PIMRC, pp. 1431-1437, 2022.
  32. M. Xu, Y. Wu, D. Cai, X. Li, and S. Wang, “Federated fine-tuning of billion-sized language models across mobile devices”, 2023. [Online]. Available: https://arxiv.org/pdf/2308.13894.pdf
  33. H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. PMLR, pp. 1273-1282, Apr. 2017.
  34. X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of FedAvg on non-IID data,” 2019. [Online]. Available: https://arxiv.org/pdf/1907.02189.pdf
  35. Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing,” IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 7947-7962, Dec. 2021.
  36. L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for largescale machine learning,” SIAM Rev., vol. 60, no. 2, pp. 223-311, 2018.
  37. N. Tripuraneni, M. Jordan, and C. Jin, “On the theory of transfer learning: The importance of task diversity,” in Proc. NeurIPS, pp. 7852-7862, 2020.
  38. V. Panaretos and Y. Zemel, “Statistical aspects of Wasserstein distances,” Annu. Rev. Stat. Appl., vol. 6, no. 1, pp. 405-431, Mar. 2019.
  39. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proc. the International Conference on Machine Learning, pp. 214-223, 2017.
  40. M. Grand and S.Boyd, “CVX: Matlab software for disciplined convex programming,” 2016. [Online]. Available: http://cvxr.com/cvx
  41. Y. LeCun, C. Cortes, and C. Burges, “The MNIST database of handwritten digits,” [Online]. Available: http://yann.lecun.com/exdb/mnist
  42. L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “CINIC-10 is not ImageNet or CIFAR-10,” [Online]. Available: https://arxiv.org/pdf/1810.03505.pdf
  43. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
  44. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. CVPR, pp. 1-9, 2015.
  45. X. Deng, J. Li, C. Ma, K. Wei, L. Shi, M. Ding, and W. Chen, “Low-latency federated learning with DNN partition in distributed industrial IoT networks,” IEEE J. Sel. Areas Commun., vol. 41, no. 3, pp. 755-775, Mar. 2023.
  46. M. Wu, D. Ye, J. Ding, Y. Guo, R. Yu, and M. Pan, “Incentivizing differentially private federated learning: A multidimensional contract approach,” IEEE Internet Things J., vol. 8, no. 13, pp. 10639-10651, Jul. 2021.
  47. D. Amodei, D. Hernandez, G. Sastry, J. Clark, G. Brockman, and I. Sutskever, “AI and compute,” OpenAI, USA, Accessed May 16, 2018. [Online]. Available: https://openai.com/research/ai-and-compute
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhonghao Lyu (15 papers)
  2. Yuchen Li (84 papers)
  3. Guangxu Zhu (88 papers)
  4. Jie Xu (467 papers)
  5. H. Vincent Poor (884 papers)
  6. Shuguang Cui (275 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com