Kimad: Adaptive Gradient Compression with Bandwidth Awareness (2312.08053v1)

Published 13 Dec 2023 in cs.LG, cs.DC, cs.IT, and math.IT

Abstract: In distributed training, communication often emerges as a bottleneck. In response, we introduce Kimad, a solution that offers adaptive gradient compression. By consistently monitoring bandwidth, Kimad refines compression ratios to match specific neural network layer requirements. Our exhaustive tests and proofs confirm Kimad's outstanding performance, establishing it as a benchmark in adaptive compression for distributed deep learning.

References (37)

Ahmed M Abdelmoniem and Marco Canini. 2021. DC2: Delay-aware compression control for distributed machine learning. In INFOCOM.
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification. In MLSys.
L-GreCo: An Efficient and General Framework for Layerwise-Adaptive Gradient Compression. arXiv:2210.17357 [cs.LG]
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. In NeurIPS.
The Convergence of Sparsified Gradient Methods. In NeurIPS.
Manu Anand. 2012. Cloud monitor: Monitoring applications in cloud. In CCEM.
Auto-scaling, load balancing and monitoring in commercial and open-source clouds. Ph. D. Dissertation. INRIA.
Constraint-aware deep neural network compression. In ECCV.
Jack Choquette. 2023. NVIDIA Hopper H100 GPU: Scaling Performance. IEEE Micro 43, 3 (2023), 9–17.
NVIDIA A100 Tensor Core GPU: Performance and Innovation. IEEE Micro 41, 2 (2021), 29–35.
EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback. arXiv:2110.03294 [cs.LG]
Efficient Sparse Collective Communication and its application to Accelerate Distributed Deep Learning. In SIGCOMM.
Natural Compression for Distributed Deep Learning. In MSML.
First Analysis of Local GD on Heterogeneous Data. In NeurIPS Workshop on Federated Learning for Data Privacy and Confidentiality.
Tighter Theory for Local SGD on Identical and Heterogeneous Data. In AISTATS.
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks. In EuroSys.
Jakub Konečný and Peter Richtárik. 2018. Randomized distributed mean estimation: accuracy vs communication. Frontiers in Applied Mathematics and Statistics 4, 62 (2018), 1–11.
FlexReduce: Flexible All-reduce for Distributed Deep Learning on Asymmetric Network Topology. In DAC.
On the Convergence of FedAvg on Non-IID Data. In ICLR.
PLink: Discovering And Exploiting Datacenter Network Locality For Efficient Cloud-based Distributed Training.
CGX: Adaptive System Support for Communication-Efficient Deep Learning. In Middleware.
Distributed Learning with Compressed Gradient Differences. arXiv:1901.09269 [cs.LG]
EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback. In NeurIPS.
3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation. arXiv:2202.00998 [cs.LG]
Federated Optimization in Heterogeneous Networks. In MLSys.
1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs. In Interspeech.
Sharing the Data Center Network. In NSDI.
Sparsified SGD with Memory. In NeurIPS.
Distributed Mean Estimation with Limited Communication. In ICML.
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization. In NeurIPS.
Atomo: Communication-efficient learning via atomic sparsification. In NeurIPS.
CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks. In ICML.
Adaptive Federated Learning in Resource Constrained Edge Computing Systems. IEEE Journal on Selected Areas in Communications 37, 6 (2019), 1205–1221.
Egeria: Efficient DNN Training with Knowledge-Guided Layer Freezing. In EuroSys.
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. In NeurIPS.
GRACE: A Compressed Communication Framework for Distributed Machine Learning. In ICDCS.
Bandwidth Allocation for Multiple Federated Learning Services in Wireless Edge Networks. IEEE Transactions on Wireless Communications 21, 4 (2022), 2534–2546.

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Kimad: Adaptive Gradient Compression with Bandwidth Awareness (2312.08053v1)

Summary

Related Papers