Communication Efficient Federated Learning

Updated 25 November 2025

Communication Efficient Federated Learning is a suite of strategies that reduces data transmission during model synchronization using compression, quantization, and coding techniques.
Recent methods like FedCode, FedZip, and FedBiF achieve 12-15× reduction in communication with only a 1-2% accuracy drop, balancing compression and performance.
Adaptive techniques, including predictive coding, residual communication, and bandwidth-aware protocols, optimize transmission efficiency in heterogeneous and dynamic network environments.

Communication Efficient Federated Learning (CEFL) encompasses algorithmic and systems strategies that reduce the communication footprint of Federated Learning (FL) without significantly degrading model performance. Communication is a principal bottleneck in practical FL deployments due to large model sizes, frequent model synchronizations, and the heterogeneity of wireless and edge environments. Recent research has developed diverse compression, quantization, coding, and protocol innovations to mitigate these costs, providing actionable trade-offs between bandwidth, accuracy, and convergence.

1. Communication Bottleneck and Classical Baselines

In FL, typically following the Federated Averaging (FedAvg) protocol, model synchronization between clients and server dominates the cost profile. Each client in FedAvg iteratively:

Downloads the global model (d parameters, wordlength bits each).
Runs local SGD on private data to produce an update.
Uploads the update to the server.

For T communication rounds, total per-client traffic is $2Td\,\text{wordlength}$ bits. For high-dimensional deep models and bandwidth-limited clients, this is often prohibitive (Khalilian et al., 2023). Classical compression methods (quantization, sparsification, pruning, clustering) offer some relief, but still require transmitting all model weights or their compressed form each round, which fundamentally limits attainable communication reductions.

2. Codebook-based and Clustering Approaches

FedCode (Khalilian et al., 2023) introduces a codebook-based communication paradigm. After local training, each client partitions the d-dimensional weight update into K clusters using K-means:

$\min_{C\in\mathbb{R}^K,\, \Phi:\{1,..,d\}\to\{1,..,K\}}\sum_{j=1}^d \| \Delta\theta_j - C_{\Phi(j)} \|^2$

Only the codebook (K centroids, $K\,\text{wordlength}$ bits) and cluster assignments ( $d\lceil \log_2 K\rceil$ bits) are exchanged. To prevent codebook drift, periodic full-model synchronizations—much less frequent than at every round—align centroids and assignments. The per-client communication volume over T rounds becomes:

$C_\text{FedCode} = T \left[2 (K\,\text{wordlength}) + (F_\downarrow + F_\uparrow) d\,\lceil\log_2 K\rceil\right]$

with $F_\downarrow, F_\uparrow$ denoting the fractions of rounds requiring full-model synchronization. With moderate K (e.g., 64) and infrequent full-model syncs ( $F_\downarrow+F_\uparrow\ll 1$ ), empirical results on CIFAR-10/100 and SpeechCommands demonstrate $12\times$ – $15\times$ reduction in communication for a 1.3–2.0% accuracy drop (Khalilian et al., 2023).

Related: FedZip (Malekijoo et al., 2021) applies per-layer top- $z$ sparsification, clustering of nonzero updates (typically K=3), followed by entropy coding. Table 1 summarizes key attributes:

Method	Upstream Message	Typical Compression Ratio	Accuracy Degradation
FedCode	Codebook + rare indices	12–15×	1–2%
FedZip	Sparse quantized + coding	up to 1085×	<1.2% (on MNIST, VGG16)

FedZip achieves extreme compression by combining aggressive sparsity and symbol coding, demonstrating robustness even on large-scale models (Malekijoo et al., 2021).

3. Quantization, Binarization, and Bit-Freezing

Bit quantization techniques reduce each parameter to a fixed-precision representation, often post-training or via quantization-aware training (QAT). A recent advance is FedBiF (Li et al., 12 Sep 2025), which enables on-training quantization with a bit-freezing approach: each round, only a single bit per parameter of an m-bit representation is updated and communicated. This decouples local model accuracy (high, maintained via multiple bits) from communication (one bit/parameter per round).

The per-round bidirectional cost is dramatically reduced (e.g., 1 bit uplink + m bits downlink for m-bit parameters), yielding 16×–32× compression with sub-0.5% accuracy loss compared to FedAvg in extensive vision benchmarks. Moreover, bit-freezing introduces implicit model sparsification, with up to 60% zero weights for some configurations (Li et al., 12 Sep 2025). Binarized network FL has also seen development: by transmitting strictly binary (+1/–1) weights and aggregating via a maximum-likelihood reconstruction scheme, one can achieve a 32× compression ratio with minimal accuracy loss; a hybrid scheme further bridges the gap to full-precision accuracy (Yang et al., 2021).

4. Predictive Coding, Residual Communication, and Compressed Sensing

Predictive coding (Yue et al., 2021, Song et al., 2022) uses shared prediction functions between client and server to estimate next model states, so that only the quantized residual (error) is transmitted. In ResFed (Song et al., 2022), a simple linear predictor is sufficient to reduce residual entropy, followed by deep sparsification and quantization. This compresses a 4.08 MB model by over 700× in uplink/downlink, with ≈1% accuracy loss. Predictive coding with rate-distortion optimization and entropy coding yields up to 99% communication reduction relative to naive approaches (Yue et al., 2021).

Quantized Compressed Sensing (QCS) (Oh et al., 2021) further exploits the structured sparsity of gradient or update vectors, applying a block-sparsification, random projection (Gaussian) to low dimension, followed by Lloyd-Max quantization. At the server, approximate MMSE reconstruction (via EM-GAMP or Bussgang-based techniques) permits sub-1-bit/entry communication with negligible (<0.5%) accuracy loss on MNIST and controlled non-i.i.d. splits.

5. Adaptive, Bandwidth-Aware, and Protocol-Level Approaches

Practical FL must contend with heterogeneous and dynamically varying channel conditions. Adaptive sketch compression (Zhuansun et al., 6 May 2024) uses in-situ bandwidth prediction (per-client LSTM forecasting) to tailor compression rates per-client, adaptively resizing sketches (CountSketch or similar) of gradient updates in accordance with available bandwidth. Server-side aggregation of sketches with different sizes and coefficient-of-variation filtering enables stable convergence, reducing average per-round communication by over 60% for only a 2% accuracy loss versus standard FedAvg (Zhuansun et al., 6 May 2024).

Selective parameter transmission—via top-k masking, dynamic sampling of client subsets, or asynchronous updates combined with quantization (e.g., “lattice” quantizers in QuAFL (Zakerinia et al., 2022))—offers complementary reductions, which can be tuned to balance variance, accuracy and communication (Ji et al., 2020, Zakerinia et al., 2022).

6. Knowledge Distillation and Low-Rank Compression

Federated Distillation (FD) methods (Sattler et al., 2020, Wu et al., 2021) exchange knowledge via output soft-labels on (public or OOD) datasets, rather than raw weights. Communication cost scales as $O(nP)$ (n=distillation samples, P=classes), independent of model size. Compressed Federated Distillation integrates set selection, label quantization, and delta coding, achieving orders-of-magnitude communication reduction: e.g., test accuracy targets can be met with $10^2$ – $10^4\times$ less communication than vanilla model exchange (Sattler et al., 2020). Mutual knowledge distillation combined with dynamic low-rank gradient approximation (FedKD) provides further compression, as SVD-controlled gradient thresholds can be scheduled to adaptively allocate more or less precision throughout training (Wu et al., 2021).

Low-rank model exchange extends to full weight matrices: FedDLR (Qiao et al., 2021) automatically truncates model layers to the minimal rank required to preserve a user-defined fraction of spectral energy. Both upload and download are compressed, and the resulting model is smaller and more efficient at inference. The per-round communication is strictly monotonic, decreasing as training progresses, with empirical accuracy drops <3% (Qiao et al., 2021).

7. Theoretical Guarantees and Empirical Performance

The convergence rates for most CEFL methods match (or only slightly degrade) the uncompressed FedAvg rate under standard smoothness and (often non-i.i.d.) assumptions:

Quantization, error feedback, and codebook-based schemes introduce only additive (sublinear or constant) errors or variance terms (e.g., $\mathcal{O}(1/\sqrt{T} + \varepsilon_{\text{quant}})$ ), which vanish or can be controlled for sufficiently precise codes (Khalilian et al., 2023, Malekijoo et al., 2021).
Bandwidth- and channel-aware quantization assigns bits adaptively under system constraints, with quantifiable trade-offs between variance and communication (Chang et al., 2020).
Knowledge distillation and low-rank compression provide non-parameter dependent upper bounds on communication per round, and in practice often lead to superior wall-clock convergence profiles (Sattler et al., 2020, Qiao et al., 2021).
Extreme compression, such as predictive coding, QCS, and bit-freezing, consistently achieve an order-of-magnitude or greater reduction with <1–2% test accuracy loss on standard benchmarks.

Conclusion

The research corpus on Communication Efficient Federated Learning has now established a broad arsenal of methods: codebook- and cluster-based exchange (Khalilian et al., 2023, Malekijoo et al., 2021), on-training quantization and bit-freezing (Li et al., 12 Sep 2025), predictive/residual coding (Song et al., 2022, Yue et al., 2021), compressed sensing (Oh et al., 2021), dynamic protocol adaptation (Zhuansun et al., 6 May 2024), knowledge distillation (Sattler et al., 2020, Wu et al., 2021), and low-rank exchange (Qiao et al., 2021). These methods are complementary; hybrid strategies often yield further gains. Across the literature, a key principle emerges: aggressive, carefully designed compression and adaptive protocols allow FL to operate under strict communication budgets with minimal loss—establishing the foundation for scalable, practical, and efficient deployment in real-world wireless and edge contexts.