Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers (2310.09385v2)

Published 13 Oct 2023 in cs.AR

Abstract: Decoder-only Transformer models such as GPT have demonstrated exceptional performance in text generation, by autoregressively predicting the next token. However, the efficacy of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. Process-in-memory (PIM) architectures can minimize off-chip data movement and utilize high internal bandwidth. They stand out as promising candidates for accelerating memory-bounded tasks such as GPT inference. In this work, we propose a PIM accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication is supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism by concatenating and partitioning matrices among DRAM channels and banks to utilize all available in-memory computation units. The efficiency of the PIM-GPT architecture is verified through circuit synthesis and an event-driven clock-cycle accurate simulator. Overall, PIM-GPT achieves 41$-$137$\times$, 631$-$1074$\times$ speedup and 123$-$383$\times$, 320$-$602$\times$ energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yuting Wu (22 papers)
  2. Ziyu Wang (137 papers)
  3. Wei D. Lu (15 papers)
Citations (9)