Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity (2404.09497v1)
Abstract: Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69x and energy savings of 83.43%.
- Alex Krizhevsky et al. Imagenet Classification with Deep Convolutional Neural Networks. In Proceedings of the NIPS, 2012.
- Very Deep Convolutional Networks for End-to-End Speech Recognition. In Proceedings of the ICASSP, 2017.
- A Survey of Autonomous Driving: Common Practices and Emerging Technologies. IEEE ACCESS, 2020.
- An 89TOPS/W and 16.3 TOPS/mm22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT All-Digital SRAM-based Full-Precision Compute-in Memory Macro in 22nm for Machine-Learning Edge Applications. In Proceedings of the ISSCC, 2021.
- DDC-PIM: Efficient Algorithm/Architecture Co-Design for Doubling Data Capacity of SRAM-based Processing-in-Memory. IEEE TCAD, 2024.
- Tzu-Hsien Yang et al. Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks. In Proceedings of the ISCA, 2019.
- Bit-Transformer: Transforming Bit-Level Sparsity into Higher Preformance in ReRAM-based Accelerator. In Proceedings of the ICCAD, 2021.
- TCIM: Triangle Counting Acceleration with Processing-in-MRAM Architecture. In Proceedings of the DAC, 2020.
- Xuhang Chen et al. Accelerating Graph-Connected Component Computation with Emerging Processing-in-Memory Architecture. IEEE TCAD, 2022.
- NAND-SPIN-Based Processing-in-MRAM Architecture for Convolutional Neural Network Acceleration. SCIS, 2023.
- SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration. IEEE TCAD, 2022.
- A 2.75-to-75.9 TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating. In Proceedings of the ISSCC, 2021.
- A 28nm 53.8 TOPS/W 8b Sparse Transformer Accelerator with In-Memory Butterfly Zero Skipper for Unstructured-Pruned NN and CIM-based Local-Attention-Reusable Engine. In Proceedings of the ISSCC, 2023.
- MuITCIM: A 28nm 2.24 uJ/token Attention-Token-Bit Hybrid Sparse Digital CIM-based Accelerator for Multimodal Transformers. In Proceedings of the ISSCC, 2023.
- TT@ CIM: A Tensor-Train in-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization. IEEE JSSC, 2022.
- Sampatrao L Pinjare et al. Implementation of Artificial Neural Network Architecture for Image Compression Using CSD Multiplier. In Proceedings of the ERCICA, 2013.
- A 1.041-Mb/mm22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT 27.38-TOPS/W Signed-INT8 Dynamic-Logic-based ADC-Less SRAM Compute-in-Memory Macro in 28nm with Reconfigurable Bitwise Operation for AI and Embedded Applications. In Proceedings of the ISSCC, 2022.
- Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014.
- Deep Residual Learning for Image Recognition. In Proceedings of the CVPR, 2016.
- MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the CVPR, 2018.
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the ICML, 2019.
- Cenlin Duan (5 papers)
- Jianlei Yang (32 papers)
- Yiou Wang (4 papers)
- Yikun Wang (25 papers)
- Yingjie Qi (13 papers)
- Xiaolin He (5 papers)
- Bonan Yan (10 papers)
- Xueyan Wang (16 papers)
- Xiaotao Jia (11 papers)
- Weisheng Zhao (143 papers)