MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers (2010.11267v6)

Published 21 Oct 2020 in cs.LG

Abstract: Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints. A key component of NAS algorithms is their latency/energy model, i.e., the mapping from a given neural network architecture to its inference latency/energy on an MCU. In this paper, we observe an intriguing property of NAS search spaces for MCU model design: on average, model latency varies linearly with model operation (op) count under a uniform prior over models in the search space. Exploiting this insight, we employ differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency. Experimental results validate our methodology, yielding our MicroNet models, which we deploy on MCUs using Tensorflow Lite Micro, a standard open-source NN inference runtime widely used in the TinyML community. MicroNets demonstrate state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection. Models and training scripts can be found at github.com/ARM-software/ML-zoo.

PDF Abstract

Insights on "MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers"

The paper "MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers" presents an innovative approach to executing machine learning tasks on resource-constrained microcontrollers (MCUs). The authors address the core challenge of running deep neural networks, which typically require substantial computational and memory resources, on such limited hardware. This paper delineates the architecture of MicroNets, optimized specifically to operate efficiently on MCUs, leveraging neural architecture search (NAS) techniques.

The paper's novelty lies in its deployment of differentiable neural architecture search (DNAS) to fashion neural network models that meet stringent MCU constraints on memory (SRAM), flash storage, and latency. A distinguishing aspect of their work is the empirical observation that within certain NAS search spaces, model latency is linearly proportional to operation count, which serves as a reliable proxy for latency. This insight is pivotal as it streamlines the NAS process by focusing on operation count as a direct surrogate for actual performance metrics on MCUs.

Key Contributions and Experimental Validation

The authors highlight several notable contributions, validated through comprehensive experimentation across three TinyMLPerf benchmark tasks: visual wake words (VWW), audio keyword spotting (KWS), and anomaly detection (AD).

Latency and Energy Modeling: The paper elaborates on a methodical approach to characterizing neural network inference performance on selected MCUs. By sampling from relevant network backbones, the authors demonstrate that operational count reliably predicts both latency and energy usage, simplifying the task of optimizing models for MCU deployment.
Optimized Neural Architectures for MCUs: Through DNAS, MicroNets are sculpted to fit within the constraints of commodity MCUs. These architectures are distinctive not only for their efficiency but also for maintaining state-of-the-art accuracy across benchmark tasks, which signifies their suitability for real-world TinyML applications.
Sub-byte Quantization: The research explores sub-byte quantization (4-bit) techniques, motivated by the need to increase model capacity and accuracy within the fixed memory footprint of MCUs. This approach anticipates future hardware advancements that may provide native support for smaller datatypes, thus further enhancing model efficiency.

Practical Implications and Future Directions

The successful deployment of MicroNet models demonstrates the feasibility of leveraging TinyML on MCUs for various IoT applications. The implications are manifold, potentially transforming areas that demand immediate, low-power, on-device data processing, such as environmental monitoring, predictive maintenance, and simple visual or audio recognition tasks.

Future developments in this domain could focus on extending these methodologies to even more constrained hardware and complex models. Moreover, as hardware evolves, incorporating advanced memory technologies or improved processing units within MCUs could open new avenues for deploying TinyML applications with greater complexity. Heightened interest may also foster standardization and community-driven enhancements in open-source documentation and tools, further advancing this niche research field.

By publicly releasing the models and associated scripts, the authors contribute significantly to the collaborative progress in TinyML research, enabling comparisons and further enhancements by other researchers and practitioners. This work establishes a foundation that can be built upon as the field moves forward with ever-increasing demands for edge computing capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Colby Banbury (19 papers)
Chuteng Zhou (13 papers)
Igor Fedorov (24 papers)
Ramon Matas Navarro (2 papers)
Urmish Thakker (26 papers)
Dibakar Gope (17 papers)
Vijay Janapa Reddi (78 papers)
Matthew Mattina (35 papers)
Paul N. Whatmough (18 papers)

Related Papers

Find Related Papers

GitHub

GitHub - ARM-software/ML-zoo (181 stars)