Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning (2203.09249v2)

Published 17 Mar 2022 in cs.LG and cs.CV

Abstract: Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint. Data heterogeneity is one of the main challenges in FL, which results in slow convergence and degraded performance. Most existing approaches only tackle the heterogeneity challenge by restricting the local model update in client, ignoring the performance drop caused by direct global model aggregation. Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation. Concretely, FedFTG explores the input space of local models through a generator, and uses it to transfer the knowledge from local models to the global model. Besides, we propose a hard sample mining scheme to achieve effective knowledge distillation throughout the training. In addition, we develop customized label sampling and class-level ensemble to derive maximum utilization of knowledge, which implicitly mitigates the distribution discrepancy across clients. Extensive experiments show that our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.

PDF Abstract

Fine-tuning Global Models via Data-Free Knowledge Distillation for Non-IID Federated Learning

The paper "Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning" addresses a notable challenge in Federated Learning (FL), particularly concerning data heterogeneity among clients, which often leads to unsatisfactory performance and prolonged convergence times in global models.

Core Contributions

The authors propose a method called FedFTG, which stands for Federated Fine-Tuning of Global models, designed to improve global model performance without violating data privacy constraints. This approach hinges on data-free knowledge distillation, a technique that utilizes the outputs from local models to enhance the global model on the server-side without direct access to raw data. Specifically, FedFTG mitigates the adverse performance effects caused by direct model aggregation in a Non-IID (non-identically and independently distributed) data setting.

Methodology

The proposed FedFTG system involves several innovative components:

Data-Free Knowledge Distillation: FedFTG leverages an auxiliary generator on the server to explore the input space of local models. By producing pseudo data, knowledge from local models is transferred to the global model in a privacy-preserving manner. This approach circumvents the need for raw data by distilling knowledge through model predictions (soft labels).
Hard Sample Mining: The method includes a scheme for mining hard samples, which are data points that the global model struggles to predict accurately. This mechanism ensures that the global model is fine-tuned with challenging examples, promoting more robust improvements.
Customized Label Sampling and Class-Level Ensemble: Addressing the label distribution shift challenge among clients, FedFTG employs customized sampling of labels and selectively ensembles class-level information. This ensures that the knowledge distillation process accounts for the inherent diversity and potential imbalance in client datasets.

Experimental Results

The efficacy of FedFTG is demonstrated through extensive experimentation using popular datasets like CIFAR10 and CIFAR100. The results show that FedFTG consistently outperforms state-of-the-art (SOTA) FL algorithms, enhancing methods such as FedAvg, FedProx, FedDyn, and SCAFFOLD.

For instance, on CIFAR10 with Non-IID data distributions, FedFTG achieves a significant performance boost over traditional algorithms, validating the merit of refining global model aggregation with server-side adjustments. Moreover, through rigorous benchmarking, FedFTG is shown to reduce the required communication rounds to achieve a target accuracy, showcasing its practical benefits in reducing resource costs in FL environments.

Implications and Future Directions

The developments presented in this paper hold substantial implications for both theoretical and applied aspects of federated learning. By embedding data-free knowledge distillation into the server-side operations, FedFTG offers a scalable solution that preserves user privacy while dealing with the complexities introduced by Non-IID data.

Moving forward, further research could expand on the processing capabilities of the server, exploring more sophisticated generator networks or alternative knowledge transfer mechanisms. Additionally, the idea of balancing computational efficiency and performance, which the current FedFTG approach considers, could inform future work in energy-efficient federated learning systems.

Overall, FedFTG represents a significant step toward making federated learning more robust and effective, particularly as data privacy and heterogeneity remain critical challenges in decentralized machine learning frameworks.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Lin Zhang (342 papers)
Li Shen (363 papers)
Liang Ding (159 papers)
Dacheng Tao (829 papers)
Ling-Yu Duan (36 papers)

Citations (220)

View on Semantic Scholar