Edge AI as a Service with Coordinated Deep Neural Networks (2401.00631v2)
Abstract: As AI applications continue to expand in next-generation networks, there is a growing need for deep neural network (DNN) models. Although DNN models deployed at the edge are promising for providing AI as a service with low latency, their cooperation is yet to be explored. In this paper, we consider that DNN service providers share their computing resources as well as their models' parameters and allow other DNNs to offload their computations without mirroring. We propose a novel algorithm called coordinated DNNs on edge (\textbf{CoDE}) that facilitates coordination among DNN services by establishing new inference paths. CoDE aims to find the optimal path, which is the path with the highest possible reward, by creating multi-task DNNs from individual models. The reward reflects the inference throughput and model accuracy. With CoDE, DNN models can make new paths for inference by using their own or other models' parameters. We then evaluate the performance of CoDE through numerical experiments. The results demonstrate a $40\%$ increase in the inference throughput while degrading the average accuracy by only $2.3\%$. Experiments show that CoDE enhances the inference throughput and, achieves higher precision compared to a state-of-the-art existing method.
- N. Anantrasirichai and D. Bull, “Artificial intelligence in the creative industries: a review,” Artificial Intelligence Review, vol. 55, no. 1, pp. 589–656, 7 2021.
- T. Huynh‐The, Q.-V. Pham, X. Pham, T. Nguyen, Z. Han, and D.-S. Kim, “Artificial intelligence for the metaverse: A survey,” Engineering Applications of Artificial Intelligence, vol. 117, p. 105581, 1 2023.
- M. Xu, H. Du, D. Niyato, J. Kang, Z. Xiong, S. Mao, Z. Han, A. Jamalipour, D. I. Kim, Xuemin, Shen, V. C. M. Leung, and H. V. Poor, “Unleashing the power of Edge-Cloud Generative AI in mobile networks: A survey of AIGC services,” arXiv (Cornell University), 3 2023. [Online]. Available: https://arxiv.org/abs/2303.16129
- R. Singh and S. S. Gill, “Edge ai: a survey,” Internet of Things and Cyber-Physical Systems, 2023.
- D. Liu, H. Kong, X. Luo, W. Liu, and R. Subramaniam, “Bringing ai to edge: From deep learning’s perspective,” Neurocomputing, vol. 485, pp. 297–320, 2022.
- T. Tambe, C. Hooper, L. Pentecost, T. Jia, E.-Y. Yang, M. Donato, V. Sanh, P. Whatmough, A. M. Rush, D. Brooks et al., “Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference,” in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021, pp. 830–844.
- N. Li, A. Iosifidis, and Q. Zhang, “Graph reinforcement learning-based cnn inference offloading in dynamic edge computing,” in GLOBECOM 2022-2022 IEEE Global Communications Conference. IEEE, 2022, pp. 982–987.
- M. Ebrahimi, A. d. S. Veith, M. Gabel, and E. de Lara, “Combining dnn partitioning and early exit,” in Proceedings of the 5th International Workshop on Edge Systems, Analytics and Networking, 2022, pp. 25–30.
- L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE/ACM Transactions on Networking, vol. 29, no. 2, pp. 595–608, 2020.
- C. Hu, W. Bao, D. Wang, and F. Liu, “Dynamic adaptive dnn surgery for inference acceleration on the edge,” pp. 1423–1431, 2019.
- W. Zhang, Z. Zhang, S. Zeadally, H.-C. Chao, and V. C. Leung, “Masm: A multiple-algorithm service model for energy-delay optimization in edge artificial intelligence,” IEEE Transactions on Industrial Informatics, vol. 15, no. 7, pp. 4216–4224, 2019.
- S. Tuli, G. Casale, and N. R. Jennings, “Mcds: Ai augmented workflow scheduling in mobile edge cloud computing systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 11, pp. 2794–2807, 2021.
- L. Zhang, L. Chen, and J. Xu, “Autodidactic neurosurgeon: Collaborative deep inference for mobile edge intelligence via online learning,” pp. 3111–3123, 2021.
- Y. Xu, T. Mohammed, M. Di Francesco, and C. Fischione, “Distributed assignment with load balancing for dnn inference at the edge,” IEEE Internet of Things Journal, vol. 10, no. 2, pp. 1053–1065, 2022.
- S. Laskaridis, S. I. Venieris, M. Almeida, I. Leontiadis, and N. D. Lane, “Spinn: synergistic progressive inference of neural networks over device and cloud,” pp. 1–15, 2020.
- B. Zhang, T. Xiang, H. Zhang, T. Li, S. Zhu, and J. Gu, “Dynamic dnn decomposition for lossless synergistic inference,” pp. 13–20, 2021.
- Y. Yang, M. Ma, H. Wu, Q. Yu, P. Zhang, X. You, J. Wu, C. Peng, T.-S. P. Yum, S. Shen et al., “6g network ai architecture for everyone-centric customized services,” arXiv preprint arXiv:2205.09944, 2022.
- A. Banitalebi-Dehkordi, N. Vedula, J. Pei, F. Xia, L. Wang, and Y. Zhang, “Auto-split: A general framework of collaborative edge-cloud ai,” pp. 2543–2553, 2021.
- C. Wu, Q. Peng, Y. Xia, Y. Jin, and Z. Hu, “Towards cost-effective and robust ai microservice deployment in edge computing environments,” Future Generation Computer Systems, vol. 141, pp. 129–142, 2023.
- M. D. de Assuncao, A. da Silva Veith, and R. Buyya, “Distributed data stream processing and edge computing: A survey on resource elasticity and future directions,” Journal of Network and Computer Applications, vol. 103, pp. 1–17, 2018.
- M. Crawshaw, “Multi-task learning with deep neural networks: A survey,” arXiv preprint arXiv:2009.09796, 2020.
- C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in Artificial Neural Networks and Machine Learning – ICANN 2018, V. Kůrková, Y. Manolopoulos, B. Hammer, L. Iliadis, and I. Maglogiannis, Eds. Cham: Springer International Publishing, 2018, pp. 270–279.
- S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, “Multi-task learning for dense prediction tasks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3614–3633, 2022.
- A. Bourechak, O. Zedadra, M. N. Kouahla, A. Guerrieri, H. Seridi, and G. Fortino, “At the confluence of artificial intelligence and edge computing in iot-based applications: A review and new perspectives,” Sensors, vol. 23, no. 3, p. 1639, 2023.
- S. Heo, S. Cho, Y. Kim, and H. Kim, “Real-time object detection system with multi-path neural networks,” pp. 174–187, 2020.
- T. Vu, Y. Zhou, C. Wen, Y. Li, and J.-M. Frahm, “Toward edge-efficient dense predictions with synergistic multi-task neural architecture search,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2023, pp. 1400–1410.
- Y. Gao, H. Bai, Z. Jie, J. Ma, K. Jia, and W. Liu, “Mtl-nas: Task-agnostic neural architecture search towards general-purpose multi-task learning,” pp. 11 543–11 552, 2020.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning. PMLR, 2019, pp. 2790–2799.
- Y. Wang, S. Mukherjee, X. Liu, J. Gao, A. H. Awadallah, and J. Gao, “Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models,” arXiv preprint arXiv:2205.12410, vol. 1, no. 2, p. 4, 2022.
- J. Guo, Z. Zhang, L. Xu, H.-R. Wei, B. Chen, and E. Chen, “Incorporating bert into parallel sequence decoding with adapters,” Advances in Neural Information Processing Systems, vol. 33, pp. 10 843–10 854, 2020.
- S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” pp. 328–339, 2017.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.