Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression (2407.01378v2)

Published 1 Jul 2024 in cs.LG and cs.NI

Abstract: Gradient aggregation has long been identified as a major bottleneck in today's large-scale distributed machine learning training systems. One promising solution to mitigate such bottlenecks is gradient compression, directly reducing communicated gradient data volume. However, in practice, many gradient compression schemes do not achieve acceleration of the training process while also preserving accuracy. In this work, we identify common issues in previous gradient compression systems and evaluation methodologies. These include excessive computational overheads; incompatibility with all-reduce; and insufficient evaluation methods, such as not using an end-to-end metric or using a 32-bit baseline instead of the stronger 16-bit baseline. We revisit common compression approaches (sparsification, quantization, and low-rank decomposition) and demonstrate how considering the above issues can lead to minor but strategic design changes, resulting in notably better performance. Our goal is to raise awareness of the need for design and evaluation standards that naturally translate to the end-to-end utility of gradient compression.

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression (2407.01378v2)

Summary

Related Papers