Personalized Federated Learning

Updated 20 April 2026

Personalized federated learning is a distributed approach that creates client-specific models by tailoring shared features to unique local data distributions.
It employs a hybrid architecture splitting models into global feature extractors and local personalization layers to effectively manage statistical heterogeneity.
Advanced methods integrate meta-learning, Bayesian inference, and graph-based aggregation to balance performance, communication efficiency, and privacy across diverse client scenarios.

Personalized federated learning (PFL) is a class of federated learning methodologies that aim to provide client-specific models, addressing the critical challenge of statistical heterogeneity across decentralized datasets. Unlike classic federated learning (FL), which trains a single global model, PFL explicitly tailors its output to each client’s local data distribution, thereby enhancing performance under non-identically distributed (non-IID) data, heterogeneous client tasks, and varying client resources.

1. Foundations of Personalized Federated Learning

Personalized federated learning formulates the learning objective as a set of client-specific empirical risk minimization problems, each benefiting from collaboration but ultimately resolved via models $\{w_i\}$ that are specialized for the $i$ -th client: $\min_{w_1,\dots,w_n}\;\frac{1}{n}\sum_{i=1}^n f_i(w_i),\quad f_i(w_i)=\mathbb{E}_{(x,y)\sim\mathcal{D}_i}\bigl[\ell(w_i;x,y)\bigr]$ where $\mathcal{D}_i$ denotes the local data distribution at client $i$ and $\ell$ is a suitable loss function (Yin et al., 2024).

The necessity for personalization arises from statistical heterogeneity: label and feature distributions, class imbalances, and unbalanced sample sizes. In this regime, a single global model often underperforms by overfitting to dominant data distributions and underfitting clients with rare or skewed data (Yin et al., 2024, Mori et al., 2022).

2. Main Methodological Paradigms

2.1. Architecture and Parameter Splitting

A dominant paradigm in PFL decouples models into shared (global) and personalized (local) components, for example a shared feature extractor $\phi$ and a personalized head $\psi_i$ , as in: $w_i = (\phi, \psi_i)$ This enables transfer of invariant structure while allowing user-specific adaptation in task-relevant layers (Hu et al., 12 Mar 2026, Mclaughlin et al., 2024).

Partial personalization is realized by only sharing or aggregating selected model components (e.g., feature extractors), and personalizing the rest (e.g., classifier heads) (Mori et al., 2022, Hu et al., 12 Mar 2026).
More granular approaches employ per-parameter participation degrees, as in Learn2pFed, where a diagonal matrix $\Lambda_i$ parameterizes how much each local parameter participates in the federation (Lv et al., 2024).

2.2. Knowledge Transfer and Representation Fusion

Recent PFL frameworks leverage knowledge distillation or explicit regularization between local and global models: $i$ 0 where $i$ 1 controls the knowledge transfer from the global to the local representation (Yin et al., 2024). Other methods, e.g., FedAFK and pFedGM, include adaptive merging of global and local representations, either through explicit convex combination: $i$ 2 with $i$ 3 adaptively learned (Yin et al., 2024), or through Bayesian posterior fusion (Hu et al., 12 Mar 2026). Feature fusion can also be performed by mixing local and global class prototypes, controlled by a personalization hyperparameter $i$ 4 (Xing et al., 2024).

2.3. Meta-Learning and Hypernetworks

Meta-learning-based personalization aims to learn an initial model or meta-parameters so that client adaptation is rapid and effective. For example, Per-FedAvg applies the model-agnostic meta-learning (MAML) principle to FL (Fallah et al., 2020): $i$ 5 Hypernetwork approaches (e.g., pFedHN, PeFLL) use a central hypernetwork to generate personalized parameters for each client, either directly from client embeddings or descriptors (Shamsian et al., 2021, Scott et al., 2023). Such methods decouple the communication cost from the size of the personalized model and are especially effective when generalizing to unseen clients.

2.4. Probabilistic and Bayesian Formulations

Bayesian personalized FL models client parameters as latent variables subjected to shared global priors. For instance, pFedBreD applies Bregman-divergence-regularized Bayesian inference, minimizing

$i$ 6

where $i$ 7 is a Bregman divergence and $i$ 8 is a prior mean depending on the current global variable $i$ 9 (Shi et al., 2022). More advanced frameworks such as FedABML (Liu et al., 2023) and PAC-PFL (Boroujeni et al., 2024) hierarchically inference global priors or hyper-posteriors, with formal PAC-Bayesian generalization guarantees.

3. Aggregation, Communication, and Heterogeneity Management

Several advanced aggregation schemes improve upon vanilla FedAvg to match statistical heterogeneity:

Adaptive feature aggregation as in FedAFK uses per-client scalars (e.g., $\min_{w_1,\dots,w_n}\;\frac{1}{n}\sum_{i=1}^n f_i(w_i),\quad f_i(w_i)=\mathbb{E}_{(x,y)\sim\mathcal{D}_i}\bigl[\ell(w_i;x,y)\bigr]$ 0) and stochastic updates to interpolate local and global features (Yin et al., 2024).
Multi-branch architectures (pFedMB) parametrize each layer as a mixture of B branches; each client learns a simplex-weighted blend, and server aggregation is branch-weighted (Mori et al., 2022).
Graph-based aggregation (pFedGAT) models clients and their interaction as a dynamic graph; personalized aggregation weights are computed using a Graph Attention Network to promote similarity-based collaboration (Zhou et al., 7 Mar 2025).
Feature-uploading: pFedPM introduces class-prototype uploading instead of gradient/model deltas, reducing communication and supporting heterogeneous local architectures (Xing et al., 2024).

These mechanisms address the practical challenges of balancing personalization, generalization, communication efficiency, and privacy (Yin et al., 2024, Mori et al., 2022, Xing et al., 2024).

4. Theoretical Analysis and Generalization Guarantees

PFL methods frequently provide theoretical analysis regarding convergence, stability, and generalization, which are crucial in non-IID and low-sample regimes.

Convergence properties for personalized variants of FedAvg and meta-learning approaches are established under smoothness and bounded heterogeneity, with rates like $\min_{w_1,\dots,w_n}\;\frac{1}{n}\sum_{i=1}^n f_i(w_i),\quad f_i(w_i)=\mathbb{E}_{(x,y)\sim\mathcal{D}_i}\bigl[\ell(w_i;x,y)\bigr]$ 1 (Yin et al., 2024, Fallah et al., 2020, Lv et al., 2024).
Generalization bounds: PAC-PFL and pFedGP supply PAC-Bayesian bounds on client or hyper-posterior risk, providing explicit non-asymptotic guarantees, even for new clients unseen during training (Boroujeni et al., 2024, Achituve et al., 2021).
Bias–variance personalization trade-off: Algorithms such as pFedFDA formalize the trade-off in mixing local and global statistics, showing that optimal interpolation minimizes expected error, balancing global bias and local variance (Mclaughlin et al., 2024).

5. Empirical Evaluation and Comparative Insights

Comprehensive empirical studies benchmark numerous PFL methods across vision, text, and time-series datasets with controlled data heterogeneity:

Dataset	Most Effective Methods	Key Observations
CIFAR-10	FedAFK, pFedMB, PAC-PFL, pFedGP, pFedGM	FedAFK and pFedMB outperform both non-personalized and earlier PFL baselines in extreme non-IID
FEMNIST	pFedBreD, PAC-PFL, PeFLL, kNN-Per	Bayesian, memorized, or embedding-based personalization yields highest and most robust accuracy
Cross-silo	Fine-tuning + Mixture-of-Experts (Combinational)	Simple FT + MoE combinations routinely outperform standalone personalization (Pye et al., 2021)
CIFAR-100	pFedGM, FedAFK, pFedMB, GroupPerFL	Dual-objective and group-level adaptation is superior under class-imbalance or label-skew

Fine-tuned global models (FedAvg + local adaptation) remain strong baselines, sometimes outperforming complex PFL methods in mild heterogeneity (Matsuda et al., 2022).

6. Open Problems and Emerging Directions

Key limitations and future research priorities articulated in recent literature include:

Dynamic client participation and robust personalization rate estimation, adapting to the practical willingness or ability of clients to personalize (Ma et al., 2022).
Scalability to large numbers of clients/classes: Prototype and feature-fusion approaches face scaling challenges in very high-dimensional or high-class-count regimes (Xing et al., 2024).
Non-Gaussian, mixture, or heavy-tailed modeling: Many Bayesian and generative approaches currently assume Gaussianity or exponential-family structure; extending to richer priors remains an open problem (Hu et al., 12 Mar 2026, Shi et al., 2022).
Theoretical gaps: For many adaptive feature aggregation and hypernetwork approaches, formal convergence or generalization guarantees are missing (Yin et al., 2024, Mori et al., 2022).
Client privacy and system heterogeneity: Achieving privacy (e.g., differential privacy for personalized parameters, secure aggregation of descriptors) and seamless support for local model heterogeneity are ongoing areas of exploration (Xing et al., 2024, Cantu-Cervini, 2024, Shamsian et al., 2021).

A plausible implication is that next-generation PFL will likely combine deep probabilistic modeling, graph-structured collaboration, and meta-learning, under explicit communication, privacy, and generalization constraints.

7. Practical Recommendations and Benchmarks

Several empirical and benchmarking studies advise practitioners to:

Start with simple post-hoc fine-tuning of the global model before resorting to complex PFL schemes; tune the degree of personalization to match data heterogeneity (Matsuda et al., 2022, Pye et al., 2021).
In cross-device scenarios with severe non-IID or data-poor clients, consider modern divisor-based methods (FedAFK, pFedMB, FedFDA, pFedGM) or Bayesian meta-learners (PAC-PFL, pFedBreD, FedABML) for robustness and fairness (Yin et al., 2024, Mori et al., 2022, Mclaughlin et al., 2024, Shi et al., 2022, Boroujeni et al., 2024, Liu et al., 2023).
Evaluate communication and computation budgets: multi-branch, hypernetwork, and feature-fusion methods reduce communication load, while methods like pFedGP or amortized Bayesian meta-learning offer strong performance in data-scarce and privacy-critical regimes (Mori et al., 2022, Xing et al., 2024, Liu et al., 2023).
Utilize open benchmarking frameworks (e.g., FedBench) for consistent algorithm comparison across heterogeneity levels, number of clients, and data domains (Matsuda et al., 2022).
Always tune the key trade-off hyperparameters (e.g., participation degree, fusion coefficients, KL or Bregman weights) per task and client population (Yin et al., 2024, Shi et al., 2022, Lv et al., 2024).

Personalized federated learning is a mature, rapidly evolving subfield whose current state is defined by hybridization of model-based, probabilistic, and graph-structured approaches, with quantifiable improvement under heterogeneity, theoretical guarantees for generalization, and growing practical adoption in privacy-critical, distributed applications.