Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference (2407.14645v1)

Published 19 Jul 2024 in cs.AR, cs.DC, and cs.LG

Abstract: Aligning future system design with the ever-increasing compute needs of LLMs is undoubtedly an important problem in today's world. Here, we propose a general performance modeling methodology and workload analysis of distributed LLM training and inference through an analytical framework that accurately considers compute, memory sub-system, network, and various parallelization strategies (model parallel, data parallel, pipeline parallel, and sequence parallel). We validate our performance predictions with published data from literature and relevant industry vendors (e.g., NVIDIA). For distributed training, we investigate the memory footprint of LLMs for different activation re-computation methods, dissect the key factors behind the massive performance gain from A100 to B200 ($\sim$ 35x speed-up closely following NVIDIA's scaling trend), and further run a design space exploration at different technology nodes (12 nm to 1 nm) to study the impact of logic, memory, and network scaling on the performance. For inference, we analyze the compute versus memory boundedness of different operations at a matrix-multiply level for different GPU systems and further explore the impact of DRAM memory technology scaling on inference latency. Utilizing our modeling framework, we reveal the evolution of performance bottlenecks for both LLM training and inference with technology scaling, thus, providing insights to design future systems for LLM training and inference.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Joyjit Kundu (18 papers)
  2. Wenzhe Guo (6 papers)
  3. Ali BanaGozar (2 papers)
  4. Udari De Alwis (1 paper)
  5. Sourav Sengupta (3 papers)
  6. Puneet Gupta (20 papers)
  7. Arindam Mallik (3 papers)
Citations (1)