Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

Published 25 Feb 2020 in cs.LG, cs.DC, and stat.ML | (2002.11082v1)

Abstract: The communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In particular, the growing size of deep learning models leads to higher communication overheads that defy the ideal linear training speedup regarding the number of devices. Gradient quantization is one of the common methods to reduce communication costs. However, it can lead to quantization error in the training and result in model performance degradation. In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for \textbf{ANY} gradient distribution. Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively, which dynamically determine the optimal quantization levels. Extensive experimental results on CIFAR and ImageNet datasets with several popular convolutional neural networks show the superiority of our proposed methods.

Abstract PDF Upgrade to Chat

Citations (5)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (3)

Collections

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections