Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ouroboros: On Accelerating Training of Transformer-Based Language Models (1909.06695v1)

Published 14 Sep 2019 in cs.CL, cs.LG, and stat.ML

Abstract: LLMs are essential for NLP tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based LLM with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to LLMs. We propose the first model-parallel algorithm that speeds the training of Transformer-based LLMs. We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems. Extensive experiments on Transformer and Transformer-XL LLMs demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy. Code to reproduce experiments is to be found at \url{https://github.com/LaraQianYang/Ouroboros}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qian Yang (146 papers)
  2. Zhouyuan Huo (29 papers)
  3. Wenlin Wang (27 papers)
  4. Heng Huang (189 papers)
  5. Lawrence Carin (203 papers)
Citations (9)