Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 83 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

Towards Personalized Federated Learning via Comprehensive Knowledge Distillation (2411.03569v1)

Published 6 Nov 2024 in cs.LG, cs.AI, cs.CR, and cs.CV

Abstract: Federated learning is a distributed machine learning paradigm designed to protect data privacy. However, data heterogeneity across various clients results in catastrophic forgetting, where the model rapidly forgets previous knowledge while acquiring new knowledge. To address this challenge, personalized federated learning has emerged to customize a personalized model for each client. However, the inherent limitation of this mechanism is its excessive focus on personalization, potentially hindering the generalization of those models. In this paper, we present a novel personalized federated learning method that uses global and historical models as teachers and the local model as the student to facilitate comprehensive knowledge distillation. The historical model represents the local model from the last round of client training, containing historical personalized knowledge, while the global model represents the aggregated model from the last round of server aggregation, containing global generalized knowledge. By applying knowledge distillation, we effectively transfer global generalized knowledge and historical personalized knowledge to the local model, thus mitigating catastrophic forgetting and enhancing the general performance of personalized models. Extensive experimental results demonstrate the significant advantages of our method.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces FedCKD, which employs dual-teacher distillation from global and historical models to balance personalization and generalization.
  • It achieves superior accuracy, outperforming benchmarks by up to 1.98% on CIFAR100, effectively addressing catastrophic forgetting.
  • The method phases training into local distillation and global aggregation, ensuring robust performance in heterogeneous data environments.

Towards Personalized Federated Learning via Comprehensive Knowledge Distillation

Introduction

Federated Learning (FL) is a decentralized machine learning approach aimed at protecting data privacy by allowing collaborative training of machine learning models without sharing raw data. However, FL faces challenges due to data heterogeneity across clients, leading to catastrophic forgetting, where models lose previously acquired knowledge when learning new information. Personalized Federated Learning (PFL) attempts to customize models for individual clients to combat this issue. This paper proposes the FedCKD method, which leverages comprehensive knowledge distillation (KD) to balance model personalization and generalization.

Personalized Federated Learning

In FL, a server coordinates nn clients, each with private data Dk={xk,yk}\mathcal{D}_k = \{\bm{x}_k, \bm{y}_k\}. The goal is to train a global model wg\bm{w}_g by minimizing the overall loss across all clients. Traditional FL methods often fail in heterogeneous data scenarios, where personalization is required. PFL aims to overcome this by optimizing individual models wk\bm{w}_k for each client, striking a balance between local adaptation and global knowledge.

Challenges and Limitations

Traditional PFL places excessive emphasis on personalization, potentially leading to overfitting and compromised generalization. Existing methods like pFedSD focus on transferring personalized knowledge from historical models but neglect generalized knowledge, compromising adaptability across diverse clients.

Comprehensive Knowledge Distillation

Knowledge Distillation (KD) enables the transfer of knowledge from a teacher model to a student model, smoothing output distributions with a temperature parameter τ\tau. FedCKD applies multi-teacher KD, using both a global model wg\bm{w}_g and a historical model wh\bm{w}_h as teachers. This approach effectively integrates global generalization and personalized learning, mitigating catastrophic forgetting and enhancing model performance.

Figure 1

Figure 1: Data heterogeneity in FL leads to catastrophic forgetting.

Multi-Teacher Knowledge Distillation

FedCKD utilizes dual knowledge sources: the global model for generalized knowledge and the historical model for personalized knowledge. The loss function incorporates both these sources, facilitating comprehensive learning. An annealing mechanism adjusts the emphasis on soft labels over time, enhancing the transition from knowledge transfer to independent learning. Figure 2

Figure 2: The framework of our method. The client update phase is divided into local distillation and local store processes.

Methodology

FedCKD divides training into multiple phases, including server broadcasting, client updating, local distillation, and global aggregation. Clients update their models with global knowledge, perform comprehensive distillation, and store updated models for future iterations. Algorithmically, this ensures a dynamic balance between personal and collective learning.

Experiments

Setup

Experiments utilized FMNIST, CIFAR10, and CIFAR100 datasets under practical and pathological settings. FedCKD is compared against benchmarks such as FedAvg, FedProx, and diverse PFL methods. Experiments varied client numbers, participation rates, and model architectures.

Figure 3

Figure 3: Data heterogeneity among 20 clients on the CIFAR100 dataset.

Results

FedCKD consistently achieved superior accuracy across datasets, notably surpassing pFedSD by 1.98\% on CIFAR100 with 20 clients. It demonstrated strong personalized performance, robustly adapting to varying client data distributions while achieving high individual fairness. Sensitivity analysis further confirmed its stability across heterogeneity levels, client participation rates, and model architectures. Figure 4

Figure 4

Figure 4: Learning curves under different experimental settings.

Figure 5

Figure 5

Figure 5: Accuracy difference~(\%) among 20 clients on the CIFAR100 dataset.

Conclusion

FedCKD introduces an innovative approach to PFL, achieving a harmonious integration of personalized and generalized knowledge. Through comprehensive KD and a strategic annealing mechanism, it substantially mitigates catastrophic forgetting while improving model accuracy and stability. Future research should focus on optimizing KD to reduce computational overhead in federated environments.