Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference (2306.01811v3)

Published 2 Jun 2023 in cs.LG, cs.DC, and cs.OS

Abstract: Due to limited resources on edge and different characteristics of deep neural network (DNN) models, it is a big challenge to optimize DNN inference performance in terms of energy consumption and end-to-end latency on edge devices. In addition to the dynamic voltage frequency scaling (DVFS) technique, the edge-cloud architecture provides a collaborative approach for efficient DNN inference. However, current edge-cloud collaborative inference methods have not optimized various compute resources on edge devices. Thus, we propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework, which co-optimizes DVFS and offloading parameters via deep reinforcement learning (DRL). Specifically, DVFO automatically co-optimizes 1) the CPU, GPU and memory frequencies of edge devices, and 2) the feature maps to be offloaded to cloud servers. In addition, it leverages a thinking-while-moving concurrent mechanism to accelerate the DRL learning process, and a spatial-channel attention mechanism to extract DNN feature maps of secondary importance for workload offloading. This approach improves inference performance for different DNN models under various edge-cloud network conditions. Extensive evaluations using two datasets and six widely-deployed DNN models on three heterogeneous edge devices show that DVFO significantly reduces the energy consumption by 33% on average, compared to state-of-the-art schemes. Moreover, DVFO achieves up to 28.6%-59.1% end-to-end latency reduction, while maintaining accuracy within 1% loss on average.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. J. Chen and X. Ran, “Deep learning with edge computing: A review,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1655–1674, 2019.
  2. W. Jang, H. Jeong, K. Kang, N. Dutt, and J.-C. Kim, “R-tod: Real-time object detector with minimized end-to-end delay for autonomous driving,” in 2020 IEEE Real-Time Systems Symposium (RTSS).   IEEE, 2020, pp. 191–204.
  3. Y. Gao, J. Lin, J. Xie, and Z. Ning, “A real-time defect detection method for digital signal processing of industrial inspection applications,” IEEE Transactions on Industrial Informatics, vol. 17, no. 5, pp. 3450–3459, 2020.
  4. F. Boutros, N. Damer, F. Kirchbuchner, and A. Kuijper, “Elasticface: Elastic margin loss for deep face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1578–1587.
  5. M. Han, H. Zhang, R. Chen, and H. Chen, “Microsecond-scale preemption for concurrent GPU-accelerated DNN inferences,” in 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), 2022, pp. 539–558.
  6. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  7. Q. Wang, X. Mei, H. Liu, Y.-W. Leung, Z. Li, and X. Chu, “Energy-aware non-preemptive task scheduling with deadline constraint in dvfs-enabled heterogeneous clusters,” IEEE Transactions on Parallel and Distributed Systems, 2022.
  8. S. M. Nabavinejad, S. Reda, and M. Ebrahimi, “Coordinated batching and dvfs for dnn inference on gpu accelerators,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 10, pp. 2496–2508, 2022.
  9. S. Laskaridis, S. I. Venieris, M. Almeida, I. Leontiadis, and N. D. Lane, “Spinn: synergistic progressive inference of neural networks over device and cloud,” in Proceedings of the 26th annual international conference on mobile computing and networking, 2020, pp. 1–15.
  10. K. Huang and W. Gao, “Real-time neural network inference on extremely weak devices: agile offloading with explainable ai,” in Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, 2022, pp. 200–213.
  11. F. Chen, H. Yu, W. Jiang, and Y. Ha, “Quality optimization of adaptive applications via deep reinforcement learning in energy harvesting edge devices,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
  12. S. Yao, J. Li, D. Liu, T. Wang, S. Liu, H. Shao, and T. Abdelzaher, “Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency,” in Proceedings of the 18th Conference on Embedded Networked Sensor Systems, 2020, pp. 476–488.
  13. S. K. Panda, M. Lin, and T. Zhou, “Energy efficient computation offloading with dvfs using deep reinforcement learning for time-critical iot applications in edge computing,” IEEE Internet of Things Journal, 2022.
  14. J. Dodge, T. Prewitt, R. Tachet des Combes, E. Odmark, R. Schwartz, E. Strubell, A. S. Luccioni, N. A. Smith, N. DeCario, and W. Buchanan, “Measuring the carbon intensity of ai in cloud instances,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1877–1894.
  15. D. Liu, S.-G. Yang, Z. He, M. Zhao, and W. Liu, “Cartad: Compiler-assisted reinforcement learning for thermal-aware task scheduling and dvfs on multicores,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 6, pp. 1813–1826, 2021.
  16. Q. Zhang, M. Lin, L. T. Yang, Z. Chen, and P. Li, “Energy-efficient scheduling for real-time systems based on deep q-learning model,” IEEE transactions on sustainable computing, vol. 4, no. 1, pp. 132–141, 2017.
  17. M. Li, Y. Li, Y. Tian, L. Jiang, and Q. Xu, “Appealnet: An efficient and highly-accurate edge/cloud collaborative architecture for DNN inference,” in 2021 58th ACM/IEEE Design Automation Conference (DAC).   IEEE, 2021, pp. 409–414.
  18. O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray et al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020.
  19. T. Xiao, E. Jang, D. Kalashnikov, S. Levine, J. Ibarz, K. Hausman, and A. Herzog, “Thinking while moving: Deep reinforcement learning with concurrent control,” in International Conference on Learning Representations, 2019.
  20. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
  21. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
  22. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 6105–6114.
  23. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  24. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  25. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
  26. S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,” Communications of the ACM, vol. 52, no. 4, pp. 65–76, 2009.
  27. J. You, J.-W. Chung, and M. Chowdhury, “Zeus: Understanding and optimizing {{\{{GPU}}\}} energy consumption of {{\{{DNN}}\}} training,” in 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), 2023, pp. 119–139.
  28. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
  29. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929.
  30. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.
  31. “jetson-stats,” https://rnext.it/jetson_stats/.
  32. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
  33. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
  34. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
  35. A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
  36. F. M. M. ul Islam and M. Lin, “Hybrid dvfs scheduling for real-time systems based on reinforcement learning,” IEEE Systems Journal, vol. 11, no. 2, pp. 931–940, 2015.
  37. Q. Zhang, M. Lin, L. T. Yang, Z. Chen, S. U. Khan, and P. Li, “A double deep q-learning model for energy-efficient edge scheduling,” IEEE Transactions on Services Computing, vol. 12, no. 5, pp. 739–749, 2018.
  38. H. Huang, M. Lin, L. T. Yang, and Q. Zhang, “Autonomous power management with double-q reinforcement learning method,” IEEE Transactions on Industrial Informatics, vol. 16, no. 3, pp. 1938–1946, 2019.
  39. A. Yeganeh-Khaksar, M. Ansari, S. Safari, S. Yari-Karin, and A. Ejlali, “Ring-dvfs: Reliability-aware reinforcement learning-based dvfs for real-time embedded systems,” IEEE Embedded Systems Letters, vol. 13, no. 3, pp. 146–149, 2020.
  40. P. Hu, J. Im, Z. Asgar, and S. Katti, “Starfish: Resilient image compression for aiot cameras,” in Proceedings of the 18th Conference on Embedded Networked Sensor Systems, 2020, pp. 395–408.
  41. X. Yang, Q. Qi, J. Wang, S. Guo, and J. Liao, “Towards efficient inference: Adaptively cooperate in heterogeneous iot edge cluster,” in 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS).   IEEE, 2021, pp. 12–23.
  42. Y. Hu, Z. Li, Y. Chen, Y. Cheng, Z. Cao, and J. Liu, “Content-aware adaptive device-cloud collaborative inference for object detection,” IEEE Internet of Things Journal, 2023.
  43. W. Zhang, Z. He, L. Liu, Z. Jia, Y. Liu, M. Gruteser, D. Raychaudhuri, and Y. Zhang, “Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading,” in Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, 2021, pp. 201–214.
  44. S. Yang, Z. Zhang, C. Zhao, X. Song, S. Guo, and H. Li, “Cnnpc: End-edge-cloud collaborative cnn inference with joint model partition and compression,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 12, pp. 4039–4056, 2022.
  45. V. Janapa Reddi, D. Kanter, P. Mattson, J. Duke, T. Nguyen, R. Chukka, K. Shiring, K.-S. Tan, M. Charlebois, W. Chou et al., “Mlperf mobile inference benchmark: An industry-standard open-source machine learning benchmark for on-device ai,” Proceedings of Machine Learning and Systems, vol. 4, pp. 352–369, 2022.
  46. M. Wang, S. Ding, T. Cao, Y. Liu, and F. Xu, “Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus,” in Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, 2021, pp. 215–228.
  47. P. Guo, B. Hu, and W. Hu, “Mistify: Automating dnn model porting for on-device inference at the edge,” in NSDI, 2021.
  48. Y. Bai, L. Chen, S. Ren, and J. Xu, “Automated customization of on-device inference for quality-of-experience enhancement,” IEEE Transactions on Computers, 2022.
  49. N. Pansare, J. Katukuri, A. Arora, F. Cipollone, R. Shaik, N. Tokgozoglu, and C. Venkataraman, “Learning compressed embeddings for on-device inference,” Proceedings of Machine Learning and Systems, vol. 4, pp. 382–397, 2022.
  50. X. Wang, L. L. Zhang, Y. Wang, and M. Yang, “Towards efficient vision transformer inference: a first study of transformers on mobile devices,” in Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, 2022, pp. 1–7.
  51. J. Zhu, Y. Tao, and Z. Zhang, “enode: Energy-efficient and low-latency edge inference and training of neural odes,” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2023, pp. 802–813.
  52. N. Ling, X. Huang, Z. Zhao, N. Guan, Z. Yan, and G. Xing, “Blastnet: Exploiting duo-blocks for cross-processor real-time dnn inference,” in Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems, 2022, pp. 91–105.
  53. Z. Zhang, H. Li, Y. Zhao, C. Lin, and J. Liu, “Pos: An operator scheduling framework for multi-model inference on edge intelligent computing,” in The 22nd International Conference on Information Processing in Sensor Networks, 2023, pp. 40–52.
  54. Z. Sun, R. Sun, C. Liu, A. R. Chowdhury, L. Lu, and S. Jha, “Shadownet: A secure and efficient on-device model inference system for convolutional neural networks,” in 2023 IEEE Symposium on Security and Privacy (SP).   IEEE Computer Society, 2022, pp. 1489–1505.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ziyang Zhang (69 papers)
  2. Yang Zhao (382 papers)
  3. Huan Li (102 papers)
  4. Changyao Lin (3 papers)
  5. Jie Liu (492 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.