Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ultra-Long Sequence Distributed Transformer (2311.02382v2)

Published 4 Nov 2023 in cs.DC and cs.AI

Abstract: Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences. It distributes a long sequence into segments among GPUs, with each GPU computing a partial self-attention for its segment. Then, it uses a fused communication and a novel double gradient averaging technique to avoid the need to aggregate partial self-attention and minimize communication overhead. We evaluated the performance between LSS Transformer and the state-of-the-art Nvidia sequence parallelism on a Wikipedia enwik8 dataset. Results show that our proposed method lead to 5.6x faster and 10.2x more memory-efficient implementation compared to state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs. Moreover, our algorithm scales to an extreme sequence length of 50,112 at 3,456 GPUs, achieving 161% super-linear parallel efficiency and a throughput of 32 petaflops.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Xiao Wang (507 papers)
  2. Isaac Lyngaas (8 papers)
  3. Aristeidis Tsaris (16 papers)
  4. Peng Chen (324 papers)
  5. Sajal Dash (4 papers)
  6. Mayanka Chandra Shekar (3 papers)
  7. Tao Luo (149 papers)
  8. Hong-Jun Yoon (3 papers)
  9. Mohamed Wahib (38 papers)
  10. John Gouley (1 paper)
Citations (3)
Youtube Logo Streamline Icon: https://streamlinehq.com