Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology (2212.00688v1)

Published 1 Dec 2022 in cs.AR

Abstract: Tiny Machine Learning (TinyML) applications impose uJ/Inference constraints, with a maximum power consumption of tens of mW. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital Ternary Neural Network (TNN) accelerator in a RISC-V-based System-on-Chip (SoC). Besides supporting Ternary Convolutional Neural Networks, we introduce extensions to the accelerator design that enable the processing of time-dilated Temporal Convolutional Neural Networks (TCNs). The design achieves 5.5 uJ/Inference, 12.2 mW, 8000 Inferences/sec at 0.5 V for a Dynamic Vision Sensor (DVS) based TCN, and an accuracy of 94.5 % and 2.72 uJ/Inference, 12.2 mW, 3200 Inferences/sec at 0.5 V for a non-trivial 9-layer, 96 channels-per-layer convolutional network with CIFAR-10 accuracy of 86 %. The peak energy efficiency is 1036 TOp/s/W, outperforming the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67x while achieving competitive accuracy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Moritz Scherer (12 papers)
  2. Alfio Di Mauro (15 papers)
  3. Tim Fischer (17 papers)
  4. Georg Rutishauser (11 papers)
  5. Luca Benini (362 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.