SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks (1708.04485v1)

Published 23 May 2017 in cs.NE, cs.AR, and cs.LG

Abstract: Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs in a wide range of situations, especially mobile platforms such as autonomous vehicles, cameras, and electronic personal assistants. This paper introduces the Sparse CNN (SCNN) accelerator architecture, which improves performance and energy efficiency by exploiting the zero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator applied during inference. Specifically, SCNN employs a novel dataflow that enables maintaining the sparse weights and activations in a compressed encoding, which eliminates unnecessary data transfers and reduces storage requirements. Furthermore, the SCNN dataflow facilitates efficient delivery of those weights and activations to the multiplier array, where they are extensively reused. In addition, the accumulation of multiplication products are performed in a novel accumulator array. Our results show that on contemporary neural networks, SCNN can improve both performance and energy by a factor of 2.7x and 2.3x, respectively, over a comparably provisioned dense CNN accelerator.

View on arXiv

Authors (9)

Angshuman Parashar (11 papers)
Minsoo Rhu (30 papers)
Anurag Mukkara (1 paper)
Antonio Puglielli (2 papers)
Rangharajan Venkatesan (9 papers)
Brucek Khailany (28 papers)
Joel Emer (8 papers)
Stephen W. Keckler (19 papers)
William J. Dally (21 papers)

Citations (1,072)

View on Semantic Scholar

Summary

Summary of "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks"

The paper "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks" by Angshuman Parashar et al. introduces the Sparse CNN (SCNN) accelerator architecture, aiming to enhance the performance and energy efficiency of Convolutional Neural Networks (CNNs) by leveraging the inherent sparsity in weights and activations.

Main Contributions

SCNN addresses critical challenges faced by traditional dense CNN accelerators, particularly for deployment on resource-constrained platforms like mobile devices. The paper's central contributions can be summarized as follows:

Sparse Dataflow Optimization: The paper proposes the dataflow for SCNN, which reorganizes the computation to optimize the reuse of weights and activations. This dataflow not only avoids unnecessary multiplications involving zero values but also maximizes the efficiency of data transfer and storage.
Compressed Data Representations: SCNN employs compressed encoding for both weights and activations to minimize data transfer volumes and storage needs. This approach retains the compressed data format throughout most of the computational pipeline, unlike other architectures that decompress data immediately after retrieval from DRAM.
Efficient Computational Units: By utilizing a multiplier array and a novel scatter accumulator array, SCNN ensures effective accumulation of multiplication products while maintaining a high degree of parallel processing. This design choice markedly enhances throughput and overall performance.
Performance and Energy Efficiency Improvements: The paper demonstrates that SCNN achieves a 2.7 $\times$ performance improvement and a 2.3 $\times$ energy reduction compared to dense CNN accelerators.

Detailed Analysis

Innovation in Dataflow and Architecture

SCNN’s primary innovation lies in its dataflow strategy – (PT-IS-CP-sparse). This method maintains sparse weights and activations in a compressed format for extensive durations, reducing redundant computations and memory transactions. By exploiting both weight and activation sparsity stemming from network pruning and the ReLU operator, SCNN delivers only necessary data to the multiplier array, enhancing computational efficiency.

Architectural Design

The SCNN architecture is characterized by several key components:

Processing Element (PE): Each PE hosts a multiplier array, scatter crossbar, and accumulator buffers, working in tandem to efficiently execute convolutions on sparse data.
Activation and Weight Buffers: These buffers store data in compressed format, significantly reducing the storage footprint and access energy.
Post-Processing Unit (PPU): The PPU handles non-linear functions, pooling, and output compression, ensuring that activations are efficiently processed and stored for subsequent layers.

Empirical Results

Through extensive simulations using a variety of contemporary networks (e.g., AlexNet, GoogLeNet, VGGNet), SCNN was evaluated against both a traditional dense accelerator (DCNN) and an optimized dense accelerator (DCNN-opt):

SCNN provides substantial speedups over DCNN, particularly as the layer density decreases, with improvements up to 4.7 $\times$ .
Energy efficiency gains were evident, with SCNN outperforming both DCNN and DCNN-opt for layers with a density of zero values exceeding a certain threshold.

Implications and Future Directions

Practical Implications: The enhancement in performance and energy efficiency implies that SCNN can significantly impact practical applications requiring on-device neural network inference, such as in mobile and edge computing scenarios. The reduced power consumption and increased speed are particularly advantageous for real-time applications like autonomous driving and high-resolution video processing.

Theoretical Implications: On a theoretical level, SCNN’s approach to dataflow and sparse computation may influence future CNN architecture designs, driving innovation towards more domain-specific accelerators that maximize efficiency through data sparsity.

Future Developments: Further exploration can be conducted into optimizing the PE configurations to address underutilization issues in certain network layers. Additionally, the integration of SCNN with emerging types of neural networks, such as transformers and hybrid models, presents an intriguing area for research.

In conclusion, the SCNN accelerator presents a robust solution for enhancing the operational efficiency of CNNs by adeptly addressing sparsity at both architectural and dataflow levels. This paper provides significant contributions to the field of AI hardware accelerators, offering insights and methodologies that can be built upon in future research endeavors.

PDF Markdown

Related Papers

Find Related Papers