MCUNet: Tiny Deep Learning on IoT Devices (2007.10319v2)

Published 20 Jul 2020 in cs.CV

Abstract: Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e.device, latency, energy, memory) under low search costs.TinyNAS is co-designed with TinyEngine, a memory-efficient inference library to expand the search space and fit a larger model. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x, and accelerating the inference by 1.7-3.3x compared to TF-Lite Micro and CMSIS-NN. MCUNet is the first to achieves >70% ImageNet top1 accuracy on an off-the-shelf commercial microcontroller, using 3.5x less SRAM and 5.7x less Flash compared to quantized MobileNetV2 and ResNet-18. On visual&audio wake words tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4-3.4x faster than MobileNetV2 and ProxylessNAS-based solutions with 3.7-4.1x smaller peak SRAM. Our study suggests that the era of always-on tiny machine learning on IoT devices has arrived. Code and models can be found here: https://tinyml.mit.edu.

PDF Abstract

MCUNet: Tiny Deep Learning on IoT Devices

The paper presented, "MCUNet: Tiny Deep Learning on IoT Devices," introduces an innovative framework for deploying deep learning models on microcontrollers, which are commonplace in IoT devices. These devices are usually resource-constrained, with significantly less memory and storage compared to mobile phones or cloud-based systems. Therefore, traditional deep learning models cannot be directly implemented. MCUNet addresses this challenge by co-designing an efficient neural architecture (TinyNAS) and a lightweight inference engine (TinyEngine), enabling substantial improvements in deploying deep learning models on these constrained devices.

Key Contributions and Methodology

The principal contribution of this paper is the formulation of a co-design framework that integrates both the design of neural networks and inference scheduling to fit microcontrollers' tight memory resources. The paper benchmarks its framework by achieving over 70% top-1 accuracy on the ImageNet dataset using off-the-shelf commercial microcontrollers—a milestone in the deployment of deep learning models on such devices.

TinyNAS: Neural Architecture Search TinyNAS employs a two-stage neural architecture search (NAS) method. First, it optimizes the search space for neural network configurations based on predefined resource constraints. These constraints are crucial for accommodating the limited on-chip memory and computational resources of microcontrollers. The search space includes varying input resolutions and width multipliers, tailored for different SRAM and Flash memory constraints. This automatic optimization not only reduces the effort of manual fine-tuning across numerous deployment scenarios but also enhances the potential for higher model accuracy.
TinyEngine: Inference Efficiency TinyEngine is a memory-efficient inference library that minimizes runtime memory overhead, allowing larger models to be successfully executed on microcontrollers. This includes memory scheduling based on overall network topology rather than merely optimizing layer-by-layer, reducing memory usage by 3.4 times, and accelerating inference by 1.7 to 3.3 times. Additionally, TinyEngine implements in-place depth-wise convolution to further pare down peak memory requirements.

Empirical Results

MCUNet demonstrated remarkable performance across several tasks and datasets. In particular, the system achieved state-of-the-art results on both visual and audio wake words tasks, performing 2.4-3.4 times faster than other solutions while consuming significantly less memory. Notably, it also maintained a high degree of accuracy for very large scale datasets like ImageNet, far surpassing the performance of traditional MobileNet variants scaled to fit the same hardware constraints.

Theoretical and Practical Implications

The implications for this research span both theoretical and practical domains. Theoretically, the co-design framework advances the method by which deep learning models are tailored to resource-scarce environments. Practically, MCUNet's success signals a shift towards more ubiquitous use of machine learning on edge devices, which could transform sectors such as healthcare, agriculture, and smart home technology by offering continuous, local AI processing without the need for constant cloud connectivity.

Future Directions

The prospects for future developments include exploring even finer-grain quantization techniques and model architectures conducive to emerging low-power device capabilities. Another direction may involve further advancements in memory-efficient neural networks that maintain high accuracy, even with tighter constraints.

The emergence of MCUNet as a credible solution for deploying sophisticated deep learning models on minimal hardware augurs a future where AI is truly pervasive, enhancing IoT devices' capabilities efficiently and effectively. The research suggests a promising future for AI on edge devices, potentially leading to a vast range of novel applications previously constrained by hardware limitations.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Ji Lin (47 papers)
Wei-Ming Chen (25 papers)
Yujun Lin (23 papers)
John Cohn (4 papers)
Chuang Gan (195 papers)
Song Han (155 papers)

Citations (436)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos