Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices (1807.07928v2)

Published 10 Jul 2018 in cs.DC and cs.CV

Abstract: A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes, and often require specialized hardware to exploit sparsity for performance improvement. Thus, many DNN accelerators designed for large DNNs do not perform well on these models. In this work, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations, and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65nm CMOS process achieves a throughput of 1470.6 inferences/sec and 2560.3 inferences/J at a batch size of 1, which is 12.6x faster and 2.5x more energy efficient than the original Eyeriss running MobileNet. We also present an analysis methodology called Eyexam that provides a systematic way of understanding the performance limits for DNN processors as a function of specific characteristics of the DNN model and accelerator design; it applies these characteristics as sequential steps to increasingly tighten the bound on the performance limits.

Overview of Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

The paper "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices" by Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze presents a sophisticated deep neural network (DNN) accelerator architecture designed to address the unique challenges presented by compact and sparse DNN models. The goal is to enhance the portability and efficiency of DNN applications across mobile devices, which are limited by power and computational resources. This work introduces Eyeriss v2, building on the limitations observed in previous accelerator designs, including the original Eyeriss architecture.

Key Contributions

Eyeriss v2 stands out for its adaptable architecture that caters to diverse DNN workloads, emphasizing high efficiency and throughput even at reduced computational precision levels. Major contributions of the work include:

  1. Hierarchical Mesh Network (HM-NoC): Eyeriss v2 integrates an innovative NoC design, which can be configured to deliver high bandwidth or high data reuse depending on utilization requirements. It accommodates a broad range of data reuse and bandwidth scenarios, attributed to its hierarchical design, outperforming predecessor designs with a 5.6× enhancement in throughput and 1.8× boost in energy efficiency on compact DNNs such as MobileNet.
  2. Sparse Processing Capabilities: The PE in Eyeriss v2 can process compressed sparse data directly in the domain, addressing irregular access patterns typical in compact and sparse DNNs. This is crucial for achieving an additional 1.2× increase in throughput and 1.3× in energy efficiency through efficient utilization of data sparsity.
  3. SIMD Processing: By incorporating SIMD support, each PE in Eyeriss v2 processes two multiply-and-accumulate operations per cycle, maximizing the throughput and maintaining energy efficiency in the face of varying DNN layer sizes and operational intensities.

Numerical Results and Implications

The paper reports a robust performance for Eyeriss v2 in handling sparse and compact DNN models. Specifically, in a 65nm CMOS process, the design achieves substantial improvements. For sparse MobileNet, Eyeriss v2 demonstrates a throughput of 1470.6 inferences per second and energy efficiency of 2560.3 inferences per Joule, achieving 12.6× faster performance and 2.5× higher energy efficiency than the initial Eyeriss design running MobileNet.

These findings underscore Eyeriss v2's potential not only in enhancing the processing capabilities of current DNNs on resource-constrained devices but also suggest pathways for future advancements in energy-efficient deep learning models. The improved adaptability of the architecture also implies broader applicability across a spectrum of deep learning tasks beyond those dominated by dense layers.

Theoretical and Practical Implications

The paper provides essential insights into the co-evolution of DNN architectures and specialized accelerators. The strategies deployed—such as enhanced NoC designs and data compression handling within PEs—represent critical considerations for the next generation of neural processing units, aiming to sustain efficiency amidst increasingly complex neural constructs.

The structured approach to address the bottlenecks in existing systems showcases a balance between architectural flexibility and computational constraints, potentially steering future research endeavors toward even more integrative and efficiency-focused designs.

Future Prospects

Research in DNN accelerators, as exemplified by Eyeriss v2, will likely continue to explore new realms of architecture heterogeneity and adaptivity to further improve energy-to-performance ratios. Anticipated developments may include broader integration of adaptive bit-width processing, increased PE parallelism adaption, and deeper investigation into workload characterization to finetune accelerator configurations dynamically.

Overall, Eyeriss v2 is a solid step forward in realizing flexible and efficient DNN processing on mobile platforms, setting a strong precedent for continual innovation in the field of AI accelerators.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yu-Hsin Chen (18 papers)
  2. Tien-Ju Yang (16 papers)
  3. Joel Emer (8 papers)
  4. Vivienne Sze (34 papers)
Citations (70)
Youtube Logo Streamline Icon: https://streamlinehq.com