Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity (2404.09497v1)

Published 15 Apr 2024 in cs.AR

Abstract: Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69x and energy savings of 83.43%.

References (21)

Authors (10)

Cenlin Duan (5 papers)
Jianlei Yang (32 papers)
Yiou Wang (4 papers)
Yikun Wang (25 papers)
Yingjie Qi (13 papers)
Xiaolin He (5 papers)
Bonan Yan (10 papers)
Xueyan Wang (16 papers)
Xiaotao Jia (11 papers)
Weisheng Zhao (143 papers)

Summary

Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity

This paper presents an innovative approach to enhancing the efficiency of SRAM-based processing-in-memory (PIM) architectures by leveraging unstructured bit-level sparsity. Traditional digital SRAM-PIM architectures face significant challenges in exploiting such sparsity due to their inherent crossbar structure, which restricts data routing and leads to inefficient utilization of randomly distributed zero-bits. To address these limitations, the authors propose the Dyadic Block PIM (DB-PIM), a co-design framework that couples algorithms with architectural innovations.

Core Contributions

Algorithmic Innovation:
- The authors introduce a Fixed Threshold Approximation (FTA) algorithm alongside a unique sparsity pattern termed the Dyadic Block (DB). This pattern involves partitioning an 8-bit binary number into four blocks, each containing two bits, which facilitates efficient bit-level operations. The FTA algorithm further enforces a uniform threshold for non-zero bits, preserving accuracy and enhancing regularity within neural network weights. The use of Canonical Signed Digit (CSD) encoding enhances the sparsity level by reducing the number of non-zero bits.
Architectural Design:
- The proposed architecture features customized PIM macros that integrate Dyadic Block multiplication units (DBMUs) and CSD-based adder trees optimized for Multiply-Accumulate (MAC) operations. Additionally, an input pre-processing unit (IPU) dynamically detects and bypasses all-zero-bit blocks, further enhancing computational efficiency. This architecture allows simultaneous storage and computation of complementary states stored in 6T SRAM cells, effectively utilizing formerly inactive crossbars.

Experimental Evaluation

The authors conducted comprehensive evaluations on several deep neural network (DNN) models, including both standard and compact architectures such as AlexNet and MobileNetV2. Their results indicate that the DB-PIM framework achieves up to a 7.69-fold speedup and energy savings of 83.43%, compared to traditional sparse neural network acceleration techniques. This performance stems from DB-PIM's ability to increase actual utilization of SRAM cells significantly, reaching utilization rates of up to 98.42% in dense computational scenarios.

Implications and Future Work

The DB-PIM framework demonstrates substantial improvements in efficiency and utilization, indicating its potential impact on both theoretical and practical aspects of PIM system design. By effectively leveraging bit-level sparsity, this framework offers a pathway to improve processing capabilities in resource-constrained environments, particularly for edge applications where efficiency is critical.

Future developments may focus on integrating this approach with existing value-level sparsity strategies, aiming to maximize overlays of multi-dimensional sparsity. Additionally, exploring applications in broader AI contexts, such as natural language processing or multimodal fusion systems, could further reinforce the versatility and robustness of this framework. As the landscape of AI continues to evolve, such synergy between algorithmic ingenuity and architectural design will be pivotal in overcoming emerging computational challenges.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/Underfox3/status/1780611056953442666

YouTube

Show All Videos