Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 90 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 22 tok/s
GPT-5 High 36 tok/s Pro
GPT-4o 91 tok/s
GPT OSS 120B 463 tok/s Pro
Kimi K2 213 tok/s Pro
2000 character limit reached

Benchmarking Ultra-Low-Power $μ$NPUs (2503.22567v2)

Published 28 Mar 2025 in cs.LG and cs.AR

Abstract: Efficient on-device neural network (NN) inference has various advantages over cloud-based processing, including predictable latency, enhanced privacy, greater reliability, and reduced operating costs for vendors. This has sparked the recent rapid development of microcontroller-scale NN accelerators, often referred to as neural processing units ($\mu$NPUs), designed specifically for ultra-low-power applications. In this paper we present the first comparative evaluation of a number of commercially-available $\mu$NPUs, as well as the first independent benchmarks for several of these platforms. We develop and open-source a model compilation framework to enable consistent benchmarking of quantized models across diverse $\mu$NPU hardware. Our benchmark targets end-to-end performance and includes model inference latency, power consumption, and memory overhead, alongside other factors. The resulting analysis uncovers both expected performance trends as well as surprising disparities between hardware specifications and actual performance, including $\mu$NPUs exhibiting unexpected scaling behaviors with increasing model complexity. Our framework provides a foundation for further evaluation of $\mu$NPU platforms alongside valuable insights for both hardware designers and software developers in this rapidly evolving space.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube