Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices (2401.08965v1)
Abstract: Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to several key advantages in latency, privacy and always-on availability. However, due to limited computing resources, efficient DNN deployment on mobile and embedded platforms is challenging. Although many hardware accelerators and static model compression methods were proposed by previous works, at system runtime, multiple applications are typically executed concurrently and compete for hardware resources. This raises two main challenges: Runtime Hardware Availability and Runtime Application Variability. Previous works have addressed these challenges through either dynamic neural networks that contain sub-networks with different performance trade-offs or runtime hardware resource management. In this thesis, we proposed a combined method, a system was developed for DNN performance trade-off management, combining the runtime trade-off opportunities in both algorithms and hardware to meet dynamically changing application performance targets and hardware constraints in real time. We co-designed novel Dynamic Super-Networks to maximise runtime system-level performance and energy efficiency on heterogeneous hardware platforms. Compared with SOTA, our experimental results using ImageNet on the GPU of Jetson Xavier NX show our model is 2.4x faster for similar ImageNet Top-1 accuracy, or 5.1% higher accuracy at similar latency. We also designed a hierarchical runtime resource manager that tunes both dynamic neural networks and DVFS at runtime. Compared with the Linux DVFS governor schedutil, our runtime approach achieves up to a 19% energy reduction and a 9% latency reduction in single model deployment scenario, and an 89% energy reduction and a 23% latency reduction in a two concurrent model deployment scenario.
- “Smart at what cost? Characterising Mobile Deep Neural Networks in the wild” In ACM Internet Measurement Conference, 2021
- “BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer” In Advances in Neural Information Processing Systems (NeurIPS), 2021
- “Automated Customization of On-Device Inference for Quality-of-Experience Enhancement” In IEEE Transactions on Computers (TC) IEEE, 2022
- “AdaMD: Adaptive Mapping and DVFS for Energy-Efficient Heterogeneous Multicores” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) IEEE, 2019
- “Once-for-All: Train One Network and Specialize it for Efficient Deployment” In International Conference on Learning Representations (ICLR), 2020
- “Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications” In ACM Transactions on Design Automation of Electronic Systems (TODAES) 27.3 ACM New York, NY, 2022, pp. 1–50
- Han Cai, Ligeng Zhu and Song Han “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware” In International Conference on Learning Representations (ICLR), 2019
- “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks” In IEEE Journal of Solid-State Circuits 52.1 IEEE, 2016, pp. 127–138
- “Machine Learning on Mobile: An On-device Inference App for Skin Cancer Detection” In International Conference on Fog and Mobile Edge Computing (FMEC), 2019
- “EIE: Efficient Inference Engine on Compressed Deep Neural Network” In International Symposium on Computer Architecture (ISCA), 2016
- “AMC: AutoML for Model Compression and Acceleration on Mobile Devices” In European Conference on Computer Vision (ECCV), 2018
- Henry Hoffmann, Axel Jantsch and Nikil D Dutt “Embodied Self-Aware Computing Systems” In Proceedings of the IEEE 108.7 IEEE, 2020, pp. 1027–1046
- “Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms” In Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021
- “Ultra-low Power DNN Accelerators for IoT: Resource Characterization of the MAX78000” In SenSys Conference 4th Workshop on AIChallengeIoT, 2022
- “Dynamic Transformer for Efficient Machine Translation on Embedded Devices” In ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD), 2021
- “Inter-cluster Thread-to-core Mapping and DVFS on Heterogeneous Multi-cores” In IEEE Transactions on Multi-Scale Computing Systems 4.3 IEEE, 2017, pp. 369–382
- “Low-Voltage Energy Efficient Neural Inference by Leveraging Fault Detection Techniques” In Nordic Circuits and Systems Conference (NorCAS), 2021
- “A High-Level Approach for Energy Efficiency Improvement of FPGAs by Voltage Trimming” In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) IEEE, 2021
- “Collaborative Adaptation for Energy-Efficient Heterogeneous Mobile SoCs” In IEEE Transactions on Computers (TC) 69.2 IEEE, 2019, pp. 185–197
- “Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCs” In ACM Transactions on Embedded Computing Systems (TECS) 16.5s ACM New York, NY, USA, 2017, pp. 1–22
- “Hardware for Machine Learning: Challenges and Opportunities” In Custom Integrated Circuits Conference (CICC), 2017
- “Runtime Configurable Deep Neural Networks for Energy-Accuracy Trade-off” In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2016
- Surat Teerapittayanon, Bradley McDanel and Hsiang-Tsung Kung “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks” In International Conference on Pattern Recognition (ICPR), 2016
- “NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution” In IEEE Transactions on Mobile Computing IEEE, 2023
- “SkipNet: Learning Dynamic Routing in Convolutional Networks” In European Conference on Computer Vision (ECCV), 2018
- “Machine Learning at Facebook: Understanding Inference at the Edge” In International Symposium on High Performance Computer Architecture (HPCA), 2019
- “A Co-Scheduling Framework for DNN Models on Mobile and Edge Devices with Heterogeneous Hardware” In IEEE Transactions on Mobile Computing IEEE, 2021
- “ReForm: Static and Dynamic Resource-Aware DNN Reconfiguration Framework for Mobile Device” In Design Automation Conference (DAC), 2019
- Lei Xun, Johnathan Hare and Geoff V Merrett “Dynamic DNNs Meet Runtime Resource Management for Efficient Heterogeneous Computing” In Workshop on Novel Architecture and Novel Design Automation (NANDA), 2023
- “Runtime DNN Performance Scaling through Resource Management on Heterogeneous Embedded Platforms” In tinyML EMEA Technical Forum, 2021
- “Dynamic DNNs Meet Runtime Resource Management on Mobile and Embedded Platforms” In UK Mobile, Wearable and Ubiquitous Systems Research Symposium (MobiUK), 2022
- “Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms” In ACM/IEEE 1st Workshop on Machine Learning for CAD (MLCAD), 2019
- “Optimising Resource Management for Embedded Machine Learning” In Design, Automation and Test in Europe Conference (DATE), 2020
- “MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations” In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) IEEE, 2021
- “NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications” In European Conference on Computer Vision (ECCV), 2018
- “Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU” In International Conference On Computer-Aided Design (ICCAD), 2021
- “AutoSlim: Towards One-Shot Architecture Search for Channel Numbers” In arXiv preprint arXiv:1903.11728, 2019
- Jiahui Yu and Thomas S Huang “Universally Slimmable Networks and Improved Training Techniques” In International Conference on Computer Vision (ICCV), 2019
- “Slimmable Neural Networks” In International Conference on Learning Representations (ICLR), 2019
- “A Survey of Deep Learning on Mobile Devices: Applications, Optimizations, Challenges, and Research Opportunities” In Proceedings of the IEEE 110.3 IEEE, 2022, pp. 334–354
- Lei Xun (7 papers)
- Jonathon Hare (32 papers)
- Geoff V. Merrett (11 papers)