Overview of Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
The paper "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices" by Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze presents a sophisticated deep neural network (DNN) accelerator architecture designed to address the unique challenges presented by compact and sparse DNN models. The goal is to enhance the portability and efficiency of DNN applications across mobile devices, which are limited by power and computational resources. This work introduces Eyeriss v2, building on the limitations observed in previous accelerator designs, including the original Eyeriss architecture.
Key Contributions
Eyeriss v2 stands out for its adaptable architecture that caters to diverse DNN workloads, emphasizing high efficiency and throughput even at reduced computational precision levels. Major contributions of the work include:
- Hierarchical Mesh Network (HM-NoC): Eyeriss v2 integrates an innovative NoC design, which can be configured to deliver high bandwidth or high data reuse depending on utilization requirements. It accommodates a broad range of data reuse and bandwidth scenarios, attributed to its hierarchical design, outperforming predecessor designs with a 5.6× enhancement in throughput and 1.8× boost in energy efficiency on compact DNNs such as MobileNet.
- Sparse Processing Capabilities: The PE in Eyeriss v2 can process compressed sparse data directly in the domain, addressing irregular access patterns typical in compact and sparse DNNs. This is crucial for achieving an additional 1.2× increase in throughput and 1.3× in energy efficiency through efficient utilization of data sparsity.
- SIMD Processing: By incorporating SIMD support, each PE in Eyeriss v2 processes two multiply-and-accumulate operations per cycle, maximizing the throughput and maintaining energy efficiency in the face of varying DNN layer sizes and operational intensities.
Numerical Results and Implications
The paper reports a robust performance for Eyeriss v2 in handling sparse and compact DNN models. Specifically, in a 65nm CMOS process, the design achieves substantial improvements. For sparse MobileNet, Eyeriss v2 demonstrates a throughput of 1470.6 inferences per second and energy efficiency of 2560.3 inferences per Joule, achieving 12.6× faster performance and 2.5× higher energy efficiency than the initial Eyeriss design running MobileNet.
These findings underscore Eyeriss v2's potential not only in enhancing the processing capabilities of current DNNs on resource-constrained devices but also suggest pathways for future advancements in energy-efficient deep learning models. The improved adaptability of the architecture also implies broader applicability across a spectrum of deep learning tasks beyond those dominated by dense layers.
Theoretical and Practical Implications
The paper provides essential insights into the co-evolution of DNN architectures and specialized accelerators. The strategies deployed—such as enhanced NoC designs and data compression handling within PEs—represent critical considerations for the next generation of neural processing units, aiming to sustain efficiency amidst increasingly complex neural constructs.
The structured approach to address the bottlenecks in existing systems showcases a balance between architectural flexibility and computational constraints, potentially steering future research endeavors toward even more integrative and efficiency-focused designs.
Future Prospects
Research in DNN accelerators, as exemplified by Eyeriss v2, will likely continue to explore new realms of architecture heterogeneity and adaptivity to further improve energy-to-performance ratios. Anticipated developments may include broader integration of adaptive bit-width processing, increased PE parallelism adaption, and deeper investigation into workload characterization to finetune accelerator configurations dynamically.
Overall, Eyeriss v2 is a solid step forward in realizing flexible and efficient DNN processing on mobile platforms, setting a strong precedent for continual innovation in the field of AI accelerators.