Robust and Communication-Efficient Federated Learning from Non-IID Data (1903.02891v1)

Published 7 Mar 2019 in cs.LG, cs.AI, cs.DC, and stat.ML

Abstract: Federated Learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning however comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods however are only of limited utility in the Federated Learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions such as iid distribution of the client data, which typically can not be found in Federated Learning. In this work, we propose Sparse Ternary Compression (STC), a new compression framework that is specifically designed to meet the requirements of the Federated Learning environment. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms Federated Averaging in common Federated Learning scenarios where clients either a) hold non-iid data, b) use small batch sizes during training, or where c) the number of clients is large and the participation rate in every communication round is low. We furthermore show that even if the clients hold iid data and use medium sized batches for training, STC still behaves pareto-superior to Federated Averaging in the sense that it achieves fixed target accuracies on our benchmarks within both fewer training iterations and a smaller communication budget.

PDF Abstract

An Analysis of Robust and Communication-Efficient Federated Learning from Non-IID Data

This paper tackles a significant challenge in the field of Federated Learning (FL)—efficiently managing communication overhead in scenarios characterized by non-IID (non-Independent and Identically Distributed) client data. The authors propose a novel compression framework termed Sparse Ternary Compression (STC), designed to meet unique requirements of FL, including robustness to data heterogeneity, efficient utilization of computational resources, and scalability.

Federated Learning facilitates collaborative training of deep learning models on decentralized data distributed across multiple clients. The inherent privacy-preserving attribute of FL eliminates the need for data transfer to a centralized server by only sharing local model updates. However, this approach incurs substantial communication overhead, often rendering it impractical for deployment in resource-constrained and heterogeneous environments such as IoT (Internet of Things) networks. Existing methods have attempted to alleviate this overhead primarily through gradient sparsification and quantization, focusing mostly on upstream communication from clients to the server. Nonetheless, these methods often falter under practical FL conditions marked by non-IID data distributions and limited client participation.

Sparse Ternary Compression (STC)

STC strategically extends top-k gradient sparsification with ternarization and optimal Golomb encoding to compress both upstream and downstream communication. This compression method introduces several advancements that synergistically enhance the efficiency and robustness of FL.

Dual Compression Mechanism: Unlike previous methods that either compress only upstream or fail to perform optimally under non-IID data distribution, STC applies a two-fold compression approach. Sparse top-k gradient sparsification is employed to select the most significant updates, which are then ternarized. This dual compression efficiently reduces the communication load, maintaining model performance and convergence rates even under non-IID conditions.
Error Residuals Accumulation: Each client maintains a residual of weight updates not included in the top-k selection. This residual helps synchronize subsequent updates, preventing divergence in the learning process—a common issue exacerbated by partial client participation.
Optimal Golomb Encoding: To handle the sparse updates effectively, the authors implement Golomb encoding, minimizing the bit-length of communicated updates. This choice is driven by the geometrically distributed inter-update distances, evident in large-scale models, further optimizing the compression gains.

Experimental Evaluation

The empirical evaluation spans four diverse deep learning tasks: VGG11* for CIFAR-10, a CNN for the KWS (Keyword Spotting) dataset, an LSTM for Fashion-MNIST, and Logistic Regression for MNIST. The experiments assess various FL scenarios concerning the number of classes per client, batch sizes, client participation rates, and balancedness of data distribution.

Non-IID Client Data: The results consistently demonstrate STC’s superiority over Federated Averaging (FedAvg) and signSGD, particularly in highly non-IID setups. For instance, when each client held data from a single class, STC outperformed FedAvg by achieving a higher accuracy (79.5% vs. failure to converge).
Batch Size Constraints: In scenarios with limited memory leading to smaller batch sizes, STC showed remarkable resilience. For example, with a batch size of 1 on the CIFAR-10 benchmark, STC attained 63.8% accuracy, whereas FedAvg stagnated at 39.2%.
Client Participation Rate: Varying client participation rates highlighted STC’s robustness. At a low participation rate (5 out of 400 clients), STC still managed to maintain competitive accuracy compared to FedAvg, reinforcing its applicability in volatile and large FL environments.
Balancedness of Data: STC demonstrated consistent performance even with unbalanced client data. This robustness is crucial for real-world deployments where data distribution is inherently uneven across participating devices.

Implications and Future Directions

STC's methodology offers substantial practical and theoretical implications:

Practical Utility:

The dual compression approach and synchronization mechanisms enable real-world deployment of FL in bandwidth-constrained and computationally limited environments, making it feasible for IoT applications and beyond.

Theoretical Insights:

The combined sparsification and ternarization can inspire further research into more nuanced gradient compression techniques. These methods could integrate advanced coding schemes tailored to specific data distributions or model architectures.

Conclusion

The introduction of Sparse Ternary Compression marks a significant advancement in communication-efficient Federated Learning. By addressing the dual challenge of upstream and downstream communication bottlenecks and demonstrating resilience to non-IID data, STC paves the way for scalable, efficient, and robust FL in diverse and practical applications. As FL continues to evolve, future work might explore adaptive compression techniques and further integration of learning dynamics with communication constraints to push the boundaries of decentralized machine learning.

The rigorous empirical results and comprehensive analysis presented underscore STC's potential as a preferred framework for efficient Federated Learning, particularly in heterogeneous, large-scale settings.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Felix Sattler (13 papers)
Simon Wiedemann (12 papers)
Klaus-Robert Müller (167 papers)
Wojciech Samek (144 papers)

Citations (1,252)

View on Semantic Scholar