Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices (2401.08965v1)

Published 17 Jan 2024 in cs.CV

Abstract: Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to several key advantages in latency, privacy and always-on availability. However, due to limited computing resources, efficient DNN deployment on mobile and embedded platforms is challenging. Although many hardware accelerators and static model compression methods were proposed by previous works, at system runtime, multiple applications are typically executed concurrently and compete for hardware resources. This raises two main challenges: Runtime Hardware Availability and Runtime Application Variability. Previous works have addressed these challenges through either dynamic neural networks that contain sub-networks with different performance trade-offs or runtime hardware resource management. In this thesis, we proposed a combined method, a system was developed for DNN performance trade-off management, combining the runtime trade-off opportunities in both algorithms and hardware to meet dynamically changing application performance targets and hardware constraints in real time. We co-designed novel Dynamic Super-Networks to maximise runtime system-level performance and energy efficiency on heterogeneous hardware platforms. Compared with SOTA, our experimental results using ImageNet on the GPU of Jetson Xavier NX show our model is 2.4x faster for similar ImageNet Top-1 accuracy, or 5.1% higher accuracy at similar latency. We also designed a hierarchical runtime resource manager that tunes both dynamic neural networks and DVFS at runtime. Compared with the Linux DVFS governor schedutil, our runtime approach achieves up to a 19% energy reduction and a 9% latency reduction in single model deployment scenario, and an 89% energy reduction and a 23% latency reduction in a two concurrent model deployment scenario.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. “Smart at what cost? Characterising Mobile Deep Neural Networks in the wild” In ACM Internet Measurement Conference, 2021
  2. “BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer” In Advances in Neural Information Processing Systems (NeurIPS), 2021
  3. “Automated Customization of On-Device Inference for Quality-of-Experience Enhancement” In IEEE Transactions on Computers (TC) IEEE, 2022
  4. “AdaMD: Adaptive Mapping and DVFS for Energy-Efficient Heterogeneous Multicores” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) IEEE, 2019
  5. “Once-for-All: Train One Network and Specialize it for Efficient Deployment” In International Conference on Learning Representations (ICLR), 2020
  6. “Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications” In ACM Transactions on Design Automation of Electronic Systems (TODAES) 27.3 ACM New York, NY, 2022, pp. 1–50
  7. Han Cai, Ligeng Zhu and Song Han “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware” In International Conference on Learning Representations (ICLR), 2019
  8. “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks” In IEEE Journal of Solid-State Circuits 52.1 IEEE, 2016, pp. 127–138
  9. “Machine Learning on Mobile: An On-device Inference App for Skin Cancer Detection” In International Conference on Fog and Mobile Edge Computing (FMEC), 2019
  10. “EIE: Efficient Inference Engine on Compressed Deep Neural Network” In International Symposium on Computer Architecture (ISCA), 2016
  11. “AMC: AutoML for Model Compression and Acceleration on Mobile Devices” In European Conference on Computer Vision (ECCV), 2018
  12. Henry Hoffmann, Axel Jantsch and Nikil D Dutt “Embodied Self-Aware Computing Systems” In Proceedings of the IEEE 108.7 IEEE, 2020, pp. 1027–1046
  13. “Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms” In Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021
  14. “Ultra-low Power DNN Accelerators for IoT: Resource Characterization of the MAX78000” In SenSys Conference 4th Workshop on AIChallengeIoT, 2022
  15. “Dynamic Transformer for Efficient Machine Translation on Embedded Devices” In ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD), 2021
  16. “Inter-cluster Thread-to-core Mapping and DVFS on Heterogeneous Multi-cores” In IEEE Transactions on Multi-Scale Computing Systems 4.3 IEEE, 2017, pp. 369–382
  17. “Low-Voltage Energy Efficient Neural Inference by Leveraging Fault Detection Techniques” In Nordic Circuits and Systems Conference (NorCAS), 2021
  18. “A High-Level Approach for Energy Efficiency Improvement of FPGAs by Voltage Trimming” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) IEEE, 2021
  19. “Collaborative Adaptation for Energy-Efficient Heterogeneous Mobile SoCs” In IEEE Transactions on Computers (TC) 69.2 IEEE, 2019, pp. 185–197
  20. “Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs” In ACM Transactions on Embedded Computing Systems (TECS) 16.5s ACM New York, NY, USA, 2017, pp. 1–22
  21. “Hardware for Machine Learning: Challenges and Opportunities” In Custom Integrated Circuits Conference (CICC), 2017
  22. “Runtime Configurable Deep Neural Networks for Energy-Accuracy Trade-off” In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2016
  23. Surat Teerapittayanon, Bradley McDanel and Hsiang-Tsung Kung “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks” In International Conference on Pattern Recognition (ICPR), 2016
  24. “NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution” In IEEE Transactions on Mobile Computing IEEE, 2023
  25. “SkipNet: Learning Dynamic Routing in Convolutional Networks” In European Conference on Computer Vision (ECCV), 2018
  26. “Machine Learning at Facebook: Understanding Inference at the Edge” In International Symposium on High Performance Computer Architecture (HPCA), 2019
  27. “A Co-Scheduling Framework for DNN Models on Mobile and Edge Devices with Heterogeneous Hardware” In IEEE Transactions on Mobile Computing IEEE, 2021
  28. “ReForm: Static and Dynamic Resource-Aware DNN Reconfiguration Framework for Mobile Device” In Design Automation Conference (DAC), 2019
  29. Lei Xun, Johnathan Hare and Geoff V Merrett “Dynamic DNNs Meet Runtime Resource Management for Efficient Heterogeneous Computing” In Workshop on Novel Architecture and Novel Design Automation (NANDA), 2023
  30. “Runtime DNN Performance Scaling through Resource Management on Heterogeneous Embedded Platforms” In tinyML EMEA Technical Forum, 2021
  31. “Dynamic DNNs Meet Runtime Resource Management on Mobile and Embedded Platforms” In UK Mobile, Wearable and Ubiquitous Systems Research Symposium (MobiUK), 2022
  32. “Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms” In ACM/IEEE 1st Workshop on Machine Learning for CAD (MLCAD), 2019
  33. “Optimising Resource Management for Embedded Machine Learning” In Design, Automation and Test in Europe Conference (DATE), 2020
  34. “MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations” In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) IEEE, 2021
  35. “NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications” In European Conference on Computer Vision (ECCV), 2018
  36. “Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU” In International Conference On Computer-Aided Design (ICCAD), 2021
  37. “AutoSlim: Towards One-Shot Architecture Search for Channel Numbers” In arXiv preprint arXiv:1903.11728, 2019
  38. Jiahui Yu and Thomas S Huang “Universally Slimmable Networks and Improved Training Techniques” In International Conference on Computer Vision (ICCV), 2019
  39. “Slimmable Neural Networks” In International Conference on Learning Representations (ICLR), 2019
  40. “A Survey of Deep Learning on Mobile Devices: Applications, Optimizations, Challenges, and Research Opportunities” In Proceedings of the IEEE 110.3 IEEE, 2022, pp. 334–354
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lei Xun (7 papers)
  2. Jonathon Hare (32 papers)
  3. Geoff V. Merrett (11 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com