Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks (2106.07894v1)

Published 15 Jun 2021 in cs.AR, cs.DC, and cs.LG

Abstract: Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine $-$ a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about $3.2\times$ and about $3.0\times$ improvements on speed and energy efficiency, respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jianlei Yang (32 papers)
  2. Wenzhi Fu (4 papers)
  3. Xingzhou Cheng (4 papers)
  4. Xucheng Ye (8 papers)
  5. Pengcheng Dai (206 papers)
  6. Weisheng Zhao (143 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.