Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data (2008.06180v2)

Published 14 Aug 2020 in cs.DC and cs.LG

Abstract: This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instead of model parameter exchange employed by the typical frameworks. In DS-FL, the communication cost depends only on the output dimensions of the models and does not scale up according to the model size. The exchanged model outputs are used to label each sample of the open dataset, which creates an additionally labeled dataset. Based on the new dataset, local models are further trained, and model performance is enhanced owing to the data augmentation effect. We further highlight that in DS-FL, the heterogeneity of the devices' dataset leads to ambiguous of each data sample and lowing of the training convergence. To prevent this, we propose entropy reduction averaging, where the aggregated model outputs are intentionally sharpened. Moreover, extensive experiments show that DS-FL reduces communication costs up to 99% relative to those of the FL benchmark while achieving similar or higher classification accuracy.

PDF Abstract

Distillation-Based Semi-Supervised Federated Learning for Communication Efficiency

Federated Learning (FL) has emerged as a pivotal machine learning paradigm offering privacy-preserving model training capabilities by utilizing data distributed across multiple mobile devices. The traditional FL approach relies heavily on the exchange of model parameters between clients and central servers, posing substantial communication challenges, especially as model sizes grow. This paper presents an advanced framework, Distillation-Based Semi-Supervised FL (DS-FL), aimed at mitigating the communication burdens without sacrificing model accuracy, even under the non-independent and identically distributed (non-IID) data conditions.

DS-FL innovatively integrates a distillation-based approach, leveraging unlabeled open datasets to minimize communication costs by transferring model outputs, or logits, instead of entire model parameters. Theoretical analysis and comprehensive experiments reveal that the communication costs associated with DS-FL are contingent only on the output size of the model, offering a significant reduction—up to 99%—compared to conventional FL benchmarks.

Core Contributions

Communication Efficiency: DS-FL circumvents the large communication overhead of typical FL settings that scale with model size. This is achieved by exchanging only model outputs. Results show substantial cost reductions, essential for scalability in resource-constrained environments like mobile networks.
Semi-Supervised Learning with Data Augmentation: By transforming unlabeled data into labeled examples using logit predictions, DS-FL enhances model performance through data augmentation. This methodological shift addresses the performance drop frequently encountered in other federated distillation methods under non-IID data setups.
Entropy Reduction Aggregation (ERA): To bolster training convergence and reduce derivations from heterogeneity in distributed datasets, the paper introduces ERA. Sharpening global logits helps counteract model training slowdowns and inefficiencies associated with high entropy values. ERA not only improves convergence rates but also augments robustness against various adversarial attacks, including those involving corrupted data inputs and model parameter manipulation.

Significant Results

The empirical results underscore DS-FL’s proficiency in maintaining model accuracy parallel to traditional FL, despite substantial communication savings. In various experimental scenarios, DS-FL achieves model accuracies that compete favorably with or surpass those of traditional FL methods while dramatically lowering the required communication overhead. Particularly under non-IID data settings, the introduction of ERA within DS-FL demonstrates improved performance metrics, thus validating the efficacy of this approach over simple logit averaging methods.

Theoretical and Practical Implications

From a theoretical standpoint, this research contributes to the FL body of work by providing a robust mechanism to handle non-IID data distributions effectively and efficiently. Practically, its implications are substantial for real-world applications where deployment of machine learning models across diverse and privacy-sensitive environments like smartphones is crucial. As mobile devices often operate under limited bandwidth, DS-FL presents a scalable solution aligning with such network conditions.

Future Directions

Moving forward, there are opportunities to fine-tune the aggregation logic based on client-specific metrics, potentially enhancing model performance further by leveraging client reliability or quality indicators. Moreover, integrating historical logits into the learning process offers another promising avenue for optimization. Addressing challenges related to massively distributed and unbalanced datasets will also be critical to broadening the applicability of DS-FL across more scenarios.

In conclusion, this research represents a significant step towards more efficient federated learning frameworks. The proposed methodologies open new directions in overcoming current limitations, setting a foundation for future advancements in communication-efficient distributed model training with enhanced privacy considerations.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Sohei Itahara (9 papers)
Takayuki Nishio (43 papers)
Yusuke Koda (19 papers)
Masahiro Morikura (23 papers)
Koji Yamamoto (34 papers)

Citations (217)

View on Semantic Scholar