Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization (2104.13818v2)

Published 28 Apr 2021 in cs.LG, math.OC, and stat.ML

Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ali Ramezani-Kebrya (11 papers)
  2. Fartash Faghri (32 papers)
  3. Ilya Markov (12 papers)
  4. Vitalii Aksenov (6 papers)
  5. Dan Alistarh (133 papers)
  6. Daniel M. Roy (73 papers)