MLPerf Tiny Benchmark (2106.07597v4)

Published 14 Jun 2021 in cs.LG and cs.AR

Abstract: Advancements in ultra-low-power tiny machine learning (TinyML) systems promise to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted and easily reproducible benchmark for these systems. To meet this need, we present MLPerf Tiny, the first industry-standard benchmark suite for ultra-low-power tiny machine learning systems. The benchmark suite is the collaborative effort of more than 50 organizations from industry and academia and reflects the needs of the community. MLPerf Tiny measures the accuracy, latency, and energy of machine learning inference to properly evaluate the tradeoffs between systems. Additionally, MLPerf Tiny implements a modular design that enables benchmark submitters to show the benefits of their product, regardless of where it falls on the ML deployment stack, in a fair and reproducible manner. The suite features four benchmarks: keyword spotting, visual wake words, image classification, and anomaly detection.

Authors (22)

Colby Banbury (19 papers)
Vijay Janapa Reddi (78 papers)
Peter Torelli (1 paper)
Jeremy Holleman (4 papers)
Nat Jeffries (5 papers)
Csaba Kiraly (10 papers)
Pietro Montino (1 paper)
David Kanter (13 papers)
Sebastian Ahmed (1 paper)
Danilo Pau (9 papers)
Urmish Thakker (26 papers)
Antonio Torrini (1 paper)
Peter Warden (1 paper)
Jay Cordaro (1 paper)
Giuseppe Di Guglielmo (27 papers)
Javier Duarte (67 papers)
Stephen Gibellini (1 paper)
Videet Parekh (1 paper)
Honson Tran (1 paper)
Nhan Tran (77 papers)

Citations (166)

View on Semantic Scholar

Summary

The paper presents the MLPerf Tiny Benchmark, offering standardized evaluation metrics including accuracy, latency, energy consumption, and efficiency for TinyML systems.
It details a modular design that accommodates hardware and software heterogeneity through closed and open submission divisions.
Initial findings reveal trends such as 8-bit integer quantization and optimized hardware-software co-design, underlining the benchmark's impact on TinyML assessment.

MLPerf Tiny Benchmark: A Comprehensive Suite for TinyML Evaluation

The advancement of tiny machine learning (TinyML) systems has prompted the introduction of specialized benchmarks to evaluate their unique characteristics. The MLPerf Tiny Benchmark is a significant initiative aimed at providing a standardized methodology to assess the performance of ultra-low-power machine learning systems, addressing a critical gap in the field. This essay summarizes the paper introducing the MLPerf Tiny Benchmark, its design principles, and initial findings from its implementation.

The MLPerf Tiny Benchmark was developed through the collaboration of over 50 organizations in industry and academia. The benchmark evaluates four core metrics: accuracy, latency, energy consumption, and efficiency in deploying machine learning inference tasks across different systems. These metrics are pivotal for TinyML applications, which operate under stringent resource constraints but are expected to maintain high performance and low power usage.

Benchmark Challenges and Solutions

TinyML systems pose several unique challenges that necessitate specialized benchmarks. These challenges include:

Low Power Consumption: The extreme power limitations of TinyML systems make it difficult to maintain consistent power measurement accuracy across diverse devices. The MLPerf Tiny Benchmark addresses this by offering a measured approach to energy profile comparison.
Limited Memory: TinyML devices generally have memory constraints orders of magnitude stricter than traditional ML systems. MLPerf Tiny provides benchmarks that consider these limits by using lightweight inference models tailored to fit within the restricted memory of TinyML devices.
Hardware Heterogeneity: With a wide variety of devices from general-purpose microcontrollers to specialized neural processors, the benchmark suite was carefully engineered to allow flexibility and adaptability to different hardware configurations.
Software Heterogeneity: There's diversity not only in hardware platforms but also in the ML software stacks used for inference. The benchmark allows for modularity and flexibility, giving submitters the freedom to demonstrate their optimizations without being locked into specific software constraints.

Benchmark Design and Its Components

MLPerf Tiny comprises a collection of benchmarks focused on tasks relevant to TinyML, specifically keyword spotting, image classification, visual wake words, and anomaly detection. Each task is designed with tailored datasets and models, ensuring they are representative of TinyML applications while adhering to memory and computational constraints.

For example, the Image Classification benchmark utilizes the CIFAR-10 dataset with a ResNet model adapted to run efficiently on constrained hardware. Similarly, the Keyword Spotting benchmark leverages the Speech Commands dataset to emulate real-world voice interaction scenarios prevalent in TinyML applications.

Modular Design and Submission Flexibility

A notable feature of MLPerf Tiny is its modular design, which offers both "closed" and "open" divisions for submissions. The closed division requires adherence to the reference models and datasets, allowing standardized comparisons between different TinyML systems. In contrast, the open division provides flexibility for submitters to modify any part of the pipeline, thereby showcasing improvements in specific areas such as model architecture or hardware performance independently.

Initial Submissions and Insights

The inaugural round of submissions displayed a breadth of implementations, ranging from integer quantization techniques on MCUs to hardware accelerators' performance demonstrations. These submissions provide initial insights into trends within the TinyML sector, such as the prevalent use of 8-bit integer quantization and a growing interest in optimized hardware-software co-design.

Impact and Future Directions

The MLPerf Tiny Benchmark establishes a baseline for TinyML systems, providing critical infrastructure for assessment and comparison as this field evolves. Future submissions will likely reflect ongoing changes in TinyML techniques, including potential shifts towards more data-centric approaches. The benchmark's adaptability ensures it will remain relevant, offering continual support for the burgeoning community of TinyML researchers and engineers.

In conclusion, the MLPerf Tiny Benchmark represents a crucial step towards systematizing performance evaluation in TinyML, thereby fostering innovation and supporting sustainable growth in this dynamic field.

PDF Markdown