Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm (2008.11343v2)

Published 26 Aug 2020 in cs.DC, cs.LG, and stat.ML

Abstract: Adam is the important optimization algorithm to guarantee efficiency and accuracy for training many important tasks such as BERT and ImageNet. However, Adam is generally not compatible with information (gradient) compression technology. Therefore, the communication usually becomes the bottleneck for parallelizing Adam. In this paper, we propose a communication efficient {\bf A}DAM {\bf p}reconditioned {\bf M}omentum SGD algorithm-- named APMSqueeze-- through an error compensated method compressing gradients. The proposed algorithm achieves a similar convergence efficiency to Adam in term of epochs, but significantly reduces the running time per epoch. In terms of end-to-end performance (including the full-precision pre-condition step), APMSqueeze is able to provide {sometimes by up to $2-10\times$ speed-up depending on network bandwidth.} We also conduct theoretical analysis on the convergence and efficiency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hanlin Tang (34 papers)
  2. Shaoduo Gan (9 papers)
  3. Samyam Rajbhandari (21 papers)
  4. Xiangru Lian (18 papers)
  5. Ji Liu (285 papers)
  6. Yuxiong He (59 papers)
  7. Ce Zhang (215 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.