Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration (1606.05487v4)

Published 17 Jun 2016 in cs.AR, cs.CV, and cs.NE

Abstract: Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm$2$ and with a power dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Renzo Andri (18 papers)
  2. Lukas Cavigelli (49 papers)
  3. Davide Rossi (69 papers)
  4. Luca Benini (362 papers)
Citations (194)

Summary

Ultra-Low Power Binary-Weight CNN Acceleration: A Study of YodaNN

The paper "YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration" by Renzo Andri et al. presents a detailed exploration of a novel hardware accelerator optimized for binary-weight Convolutional Neural Networks (CNNs). The authors address the energy limitations in deploying computationally intensive CNN models on mobile and Internet of Things (IoT) devices. The breakthrough in this research hinges on the utilization of binary weights, which fundamentally transforms the arithmetic complexity by obviating the need for expensive multiplications and significantly reducing input/output bandwidth requirements and storage.

Design and Technical Specifications

YodaNN stands out as the first hardware design emphasizing energy efficiency, achieved through the implementation of binary-weight CNNs, which use weights represented by binary values (+1/-1). This unique characteristic eliminates complex multiplication operations, substituting them with basic complement operations and multiplexers. By employing these optimizations, YodaNN sustains a throughput of 1.5 TOp/s on a core area of merely 1.33 MGE, with a power dissipation of 895 µW using UMC 65 nm technology at a reduced voltage level of 0.6 V.

Significant attention is dedicated to hardware optimizations that extend voltage scalability, such as the employment of latch-based standard cell memory (SCM) architecture. This design choice, while more area-intensive than SRAM, allows for better voltage scaling and substantial enhancements in energy efficiency.

Numerical Results and Comparisons

The authors report numerous quantitative improvements over baseline architectures and state-of-the-art solutions. YodaNN achieves 61.2 TOp/s/W in energy efficiency at 0.6 V, surpassing alternative architectures by factors up to 32x. The use of binary weights, combined with these architectural enhancements, leads to a reduction in memory and power area consumption by 3.5x to 31x compared to conventional methods.

Additionally, YodaNN's support for a spectrum of kernel sizes (1x1 to 7x7) boosts its adaptability for various network architectures without significant degradation in classification accuracy or performance.

Implications and Future Work

The implementation of YodaNN contributes substantial advancements towards on-device near-sensor analytics, making complex CNN computations feasible in energy-constrained environments like IoT edge devices. This work demonstrates the potential of binary-weight CNNs to maintain competitive accuracy while drastically minimizing energy costs, a pivotal consideration for widespread mobile deployment.

Looking forward, further research could focus on optimizing the architectural support for network types with inherently high sparsity or irregular connection patterns. Exploring enhancements that leverage deep learning algorithmic changes, extending data reuse strategies, and modular scaling in multi-core setups are promising directions. Additionally, integrating YodaNN-like architectures in fully functional SoCs with increased parallelism could enhance tackle large-scale real-time data processing tasks.

In summary, the innovative approach of YodaNN offers substantial progress in the field of energy-efficient CNN accelerators, notably by simplifying computation with binary weights, heralding a significant step toward practical, low-power deep learning solutions in ubiquitous computing environments.