Personalized Federated Learning

Updated 1 December 2025

Personalized Federated Learning is a set of methods that builds client-specific models to improve performance in heterogeneous, non-IID data environments.
It employs techniques like model regularization, meta-learning, and architectural splitting to balance global information with local adaptations.
Recent advancements include mixture models, hypernetwork-based adaptations, and Bayesian frameworks that enhance robustness, generalization, and privacy.

Personalized Federated Learning (PFL) refers to a family of federated learning approaches that explicitly construct a personalized model for each client, in order to better accommodate heterogeneity in local data distributions. PFL contrasts with classical federated learning, which primarily aims to produce a single global model that is typically suboptimal in highly non-IID (non-identically distributed) settings. By tailoring aspects of the model or training procedure to individual clients, PFL aims to strike a balance between leveraging global shared information and accommodating unique local characteristics, thereby improving performance, robustness, and privacy under real-world data heterogeneity.

1. Core Principles and Problem Formulation

PFL is motivated by the recognition that federated clients—representing users, devices, or organizations—have local datasets sampled from varying, potentially highly divergent distributions. A standard FL objective is

$\min_{w\in\mathbb R^d}~F(w) = \frac1M\sum_{i=1}^M f_i(w)$

where each $f_i(w) = \mathbb{E}_{(x,y)\sim D_i}[\ell(w;x,y)]$ reflects empirical loss on client $i$ . In the presence of client heterogeneity, this "one-size-fits-all" model $w$ often underperforms locally.

PFL reframes the objective to allow per-client model parameters $\{w_i\}$ : $\min_{\{w_i\}} \frac1M \sum_{i=1}^M f_i(w_i)$ with global information transferred via explicit regularization, architectural decoupling, mixture modeling, or hypernetwork-based adaptation (Tan et al., 2021).

The central challenge of PFL is to balance generalization (by leveraging population-wide signal) and personalization (by capturing local peculiarities), under constraints of privacy and communication efficiency.

2. Methodological Taxonomy

A comprehensive taxonomy of PFL methodologies spans several overlapping axes (Tan et al., 2021, Zhang et al., 2023):

Category	Example Approaches	Key Idea
Model Regularization & Meta-Learning	pFedMe, Ditto, Per-FedAvg	Penalize deviation from global anchor; meta-learn initializations
Model Splitting/Decoupling	FedPer, FedRep, LG-FedAvg, DBE	Divide into shared feature extractor and local head
Mixture/Ensemble Methods	APFL, FedAMP, PFL-MoE, FedBasis	Combine global and personalized outputs or model bases
Clustering/Population Modeling	PPFL, IFCA, Clustered FL	Group clients; assign/learn model for each subgroup
Modular/Compositional Architectures	FedMN, pFedMB	Assemble models client-wise from global building blocks
Generative & Bayesian Methods	pFedGP, PAC-PFL, pFedFDA, pFedBreD	Bayesian inference, personalized priors, GPs, generative adaptation
Representation Disentanglement	FedDVA, FedPick, FedAFK	Decompose into shared vs. client-specific representations
Knowledge Distillation	FedPAC, FedKD, FedProto, FedPCL	Share knowledge via proxies or soft predictions
Hypernetwork-based Generation	pFedHN, ODPFL-HN, PeFLL	Parameterize client models as functions of client embeddings

Each approach has canonical mathematical formulations and optimization procedures. For example, regularization-based methods may solve

$\min_{w,\{w_i\}} \sum_{i=1}^M f_i(w_i) + \lambda \|w_i-w\|^2$

while mixture approaches interpolate or mix between outputs: $\hat{y}_i(x) = \alpha_i f_{global}(x) + (1-\alpha_i) f_{local,i}(x)$ (Guo et al., 2020).

3. Advancements: Model Architectures, Meta-Learning, and Representation

Recent PFL advancements include:

Low-dimensional, interpretable personalization: Methods such as FedPick personalize masks in feature space rather than large neural parameters, yielding improved generalization bounds and interpretability (Zhu et al., 2023).
Compositional and modular networks: Modular approaches (FedMN, pFedMB) assign each client a combination of network blocks, with routing often handled by hypernetworks or distribution embeddings. This captures both architectural and data heterogeneity while reducing communication (Wang et al., 2022, Mori et al., 2022).
Mixture models and basis expansions: FedBasis and PPFL learn a set of population-level bases, with each client represented as a (sparse) convex combination, extending to new-client onboarding with only coefficient updates (Chen et al., 2023, Di et al., 2023).
Representation disentanglement: FedDVA and related approaches separate shared and client-specific latent codes, improving explainability and ensuring that aggregated updates do not override local specialization (Yan et al., 2023).
Hypernetwork and meta-learning frameworks: Learning-to-learn approaches (PeFLL, ODPFL-HN) use learned client embeddings to generate personalized models, generalizing directly to new or unlabeled clients (Scott et al., 2023, Amosy et al., 2021).
Probabilistic and Bayesian frameworks: pFedGP, PAC-PFL, and pFedFDA employ Bayesian posteriors or generative adaptation in feature space, achieving improved calibration, uncertainty quantification, and robustness in low-data regimes. PAC-PFL establishes generalization error bounds for both observed and new clients using PAC-Bayesian theory (Boroujeni et al., 16 Jan 2024, Mclaughlin et al., 1 Nov 2024, Achituve et al., 2021).

4. Personalization vs Generalization: Trade-offs and Optimization

A persistent tension in PFL is between over-fitting to a client's limited local data and benefiting from global knowledge (Boroujeni et al., 16 Jan 2024). Strategies to manage this trade-off include:

Explicit regularization toward global model weights [Ditto, pFedMe].
Mixture-of-experts or basis models, with data-adaptive mixing (Guo et al., 2020, Chen et al., 2023).
Adaptive feature sharing/aggregation: FedAFK learns coefficients $\mu_i$ to interpolate between local and global extractors, with knowledge transfer terms (e.g., KL divergence of feature distributions) further improving generalization (Yin et al., 19 Oct 2024).
PAC-Bayesian and bias-variance calibrated interpolation (pFedFDA) between local and global generative statistics using per-client CV to optimize interpolation coefficients $\beta_i$ (Mclaughlin et al., 1 Nov 2024).

A selection of empirical findings:

Dataset	Baseline	PFL Method	Test Acc. (%)	Setting (Non-IID)
CIFAR-10 (α=0.3)	FedAvg	pFedPT	61.9→80.8	Dirichlet, CNN backbone
CIFAR-100 (dir)	FedRep	FedAFK	51.0→57.9	Practical (β=0.1)
Fashion-MNIST	pFedMe	pFedMB	98.6→98.9	2 labels/client
Digits-Five	FedBN	FedPick	86.2→91.3	Multi-domain
CIFAR-100	pFedHN	pFedGP	52.3→61.3	100 clients

These results demonstrate that properly tuned PFL methods significantly outperform global FL or isolated local models under strong heterogeneity (Li et al., 2023, Yin et al., 19 Oct 2024, Zhu et al., 2023, Mclaughlin et al., 1 Nov 2024, Achituve et al., 2021).

5. Client Heterogeneity: Privacy, Adaptation, and New-Comer Handling

Client heterogeneity arises in statistical distribution, data size, and sometimes even model architecture (Prasetia et al., 16 Oct 2025). Approaches to these challenges include:

On-demand informal personalization: Meta-hypernetwork approaches (ODPFL-HN, PeFLL) support new clients who present only unlabeled data after deployment, via on-the-fly model synthesis (Amosy et al., 2021, Scott et al., 2023).
Modular and progressive alignment: FedPPA aligns only shared layers across heterogeneous models and assigns client weights proportional to data diversity (entropy), with local alignment steps to minimize feature drift (Prasetia et al., 16 Oct 2025).
Differential privacy: Mechanisms such as per-client noise addition to embeddings in hypernetwork approaches ensure (ε,δ)-DP without raw data leakage, with empirical evidence of negligible accuracy loss for practical ε (Amosy et al., 2021).
Interpretability and fairness: Methods such as PPFL, stacking-based PFL, and modular architectures offer explicit decomposition of model contributions per client, allowing transparent contribution accounting using feature-importance or membership vectors (Di et al., 2023, Cantu-Cervini, 16 Apr 2024).

6. Implementation, Benchmarking, and Empirical Practice

Comprehensive libraries (such as PFLlib) have been developed to benchmark PFL algorithms using diverse datasets, partitioning schemes (label skew, feature shift), and metrics (personalization gap, convergence rate, communication cost) (Zhang et al., 2023). Key benchmarking recommendations include:

Regularization/splitting strategies (e.g., Ditto, FedRep) excel under severe label skew.
For domain adaptation (covariate shift), methods focusing on feature adaptation or disentanglement (FedPick, FedDVA, pFedFDA) are emphasized.
For new-user or data-sparse scenarios, Bayesian methods (PAC-PFL, pFedGP) and basis/mixture models (FedBasis) offer superior calibration and adaptation.
Modular and mixture-of-experts approaches efficiently share knowledge and offer measured reductions in communication cost when only active modules are exchanged (Chen et al., 2023, Wang et al., 2022).

PFL approaches are evaluated in terms of average and per-client local accuracy, generalization to OOD/newcomer clients, efficiency, and privacy/robustness trade-offs.

7. Open Challenges and Future Directions

Open problems and research frontiers include:

Automated sub-architecture and basis selection: Online/dynamic methods for determining which modules/layers should be shared, expanded, or pruned when encountering new clients or domains (Chen et al., 2023, Wang et al., 2022).
Theory for convergence and generalization under complex non-IID settings: Including dynamic participation, architectural heterogeneity, and adversarial robustness. While recent work provides PAC-Bayesian bounds for GP and probabilistic models (Boroujeni et al., 16 Jan 2024, Achituve et al., 2021), extending such guarantees to deep neural personalized frameworks remains foundational.
Formal privacy-utility trade-off analysis: Particularly in stacking and modular architectures where inter-client cooperation is mediated via privatized interfaces (Cantu-Cervini, 16 Apr 2024).
Robustness and fairness: Addressing distributional shifts, system-level attacks, and incentivizing equitable collaboration via interpretable contribution mechanisms or game-theoretic designs (Di et al., 2023, Cantu-Cervini, 16 Apr 2024).
Continual and adaptive learning: Techniques supporting newcomer adaptation, partial or asynchronous participation, and dynamic assignment of personalization scope, leveraging both data- and model-centric signals (Tan et al., 2021, Amosy et al., 2021).

These directions aim to enable PFL as a principled, efficient, and trustworthy tool for large-scale decentralized learning in privacy-sensitive, highly heterogeneous environments.