Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 74 tok/s
Gemini 2.5 Pro 37 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 104 tok/s Pro
Kimi K2 184 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Distributed Sign Momentum with Local Steps for Training Transformers (2411.17866v2)

Published 26 Nov 2024 in cs.LG

Abstract: Pre-training Transformer models is resource-intensive, and recent studies have shown that sign momentum is an efficient technique for training large-scale deep learning models, particularly Transformers. However, its application in distributed training remains underexplored. This paper investigates a novel communication-efficient distributed sign momentum method with multiple local steps, to cope with the scenarios where communicating at every step is prohibitive. Our proposed method allows for a broad class of base optimizers for local steps, and uses sign momentum in the global step, where momentum is generated from differences accumulated during local steps. For generic base optimizers, by approximating the sign operator with a randomized version that acts as a continuous analog in expectation, we present a general convergence analysis, which specializes to an $O(1/\sqrt{T})$ rate for a particular instance. When local step is stochastic gradient descent, we show an optimal $O(1/T{1/4})$ rate in terms of $\ell_1$ gradient norm for nonconvex smooth cost functions. We extensively evaluate our method on the pre-training of various sized GPT-2 models from scratch, and the empirical results show significant improvement compared to other distributed methods with multiple local steps.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.