Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CPSAA: Accelerating Sparse Attention using Crossbar-based Processing-In-Memory Architecture (2210.06696v2)

Published 13 Oct 2022 in cs.AR, cs.SY, and eess.SY

Abstract: The attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. Researchers propose sparse attention to convert some DDMM operations to SDDMM and SpMM operations. However, current sparse attention solutions introduce massive off-chip random memory access. We propose CPSAA, a novel crossbar-based PIM-featured sparse attention accelerator. First, we present a novel attention calculation mode. Second, we design a novel PIM-based sparsity pruning architecture. Finally, we present novel crossbar-based methods. Experimental results show that CPSAA has an average of 89.6X, 32.2X, 17.8X, 3.39X, and 3.84X performance improvement and 755.6X, 55.3X, 21.3X, 5.7X, and 4.9X energy-saving when compare with GPU, FPGA, SANGER, ReBERT, and ReTransformer.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Huize Li (4 papers)
  2. Hai Jin (83 papers)
  3. Long Zheng (4 papers)
  4. Yu Huang (176 papers)
  5. Xiaofei Liao (11 papers)
  6. Dan Chen (20 papers)
  7. Zhuohui Duan (2 papers)
  8. Cong Liu (169 papers)
  9. Jiahong Xu (2 papers)
  10. Chuanyi Gui (1 paper)
Citations (3)