Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length (2310.00576v1)

Published 1 Oct 2023 in cs.CL and cs.LG

Abstract: The evolving sophistication and intricacies of LLMs yield unprecedented advancements, yet they simultaneously demand considerable computational resources and incur significant costs. To alleviate these challenges, this paper introduces a novel, simple, and effective method named ``\growlength'' to accelerate the pretraining process of LLMs. Our method progressively increases the training length throughout the pretraining phase, thereby mitigating computational costs and enhancing efficiency. For instance, it begins with a sequence length of 128 and progressively extends to 4096. This approach enables models to process a larger number of tokens within limited time frames, potentially boosting their performance. In other words, the efficiency gain is derived from training with shorter sequences optimizing the utilization of resources. Our extensive experiments with various state-of-the-art LLMs have revealed that models trained using our method not only converge more swiftly but also exhibit superior performance metrics compared to those trained with existing methods. Furthermore, our method for LLMs pretraining acceleration does not require any additional engineering efforts, making it a practical solution in the realm of LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hongye Jin (15 papers)
  2. Xiaotian Han (46 papers)
  3. Jingfeng Yang (31 papers)
  4. Zhimeng Jiang (33 papers)
  5. Chia-Yuan Chang (18 papers)
  6. Xia Hu (186 papers)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com