Federated Personalized Learning (FPL)

Updated 3 March 2026

FPL is a decentralized paradigm that personalizes models per client by addressing non-IID data distributions using meta-learning, hypernetwork, and regularization methods.
It employs a range of techniques—from proximal regularization to low-rank prompt tuning—to balance global knowledge sharing with local adaptations.
FPL integrates privacy-preserving mechanisms like differential privacy and secure aggregation to ensure robust model performance with minimal communication overhead.

Federated Personalized Learning (FPL) is a paradigm within federated learning that systematically addresses client heterogeneity by enabling individualized model adaptation while preserving decentralized data privacy. In contrast to classical federated learning, which seeks a single global model, FPL explicitly targets the statistical and systematic disparities among clients—such as data distribution shifts, quantity skew, varying feature spaces, or conflicting label conventions—by constructing per-client models or mechanisms for effective knowledge transfer. As an umbrella term, FPL encompasses a diverse set of architectural, algorithmic, and optimization frameworks, ranging from proximal and meta-learning methods to hypernetwork-based personalization, low-rank prompt tuning, and privacy-preserving cross-silo protocols.

1. Problem Formulation and Fundamental Objectives

In FPL, each client $i$ possesses a private dataset $D_i$ drawn from a local distribution $\mathcal{D}_i$ , often non-IID relative to other clients. The overarching objective is not to train a single global model $w$ minimizing the average empirical loss

$\min_{w} \frac{1}{N} \sum_{i=1}^N \mathbb{E}_{(x,y)\sim \mathcal{D}_i}[\ell(x, y; w)],$

but to obtain a personalized collection $\{w_i^*\}$ optimizing

$\min_{w_1,\dots,w_N} \frac{1}{N} \sum_{i=1}^N \frac{1}{|D_i|} \sum_{j=1}^{|D_i|} \ell(x_j^i, y_j^i; w_i) + R_i(w_i, w_\mathrm{shared}) + R_\mathrm{global}(w_\mathrm{shared}),$

where $R_i$ encodes any coupling (e.g., regularization toward a global anchor, cluster prototype, or learned prior) (Tan et al., 2021, Shi et al., 2022, Scott et al., 2023). This bi-level or meta-learning structure allows simultaneous leveraging of commonalities and preservation of local specificity. FPL targets a range of settings including (a) per-client statistical heterogeneity, (b) partial feature overlap, (c) personalization cost constraints, and (d) privacy/utility balancing.

2. Algorithmic Frameworks and Personalization Methodologies

2.1 Proximal and Regularization Methods

These introduce explicit per-client regularization, for example via Moreau envelopes: $\min_{w, \{\theta_i\}} \frac{1}{N} \sum_{i=1}^N \left[ f_i(\theta_i) + \frac{\lambda}{2} \|\theta_i - w\|^2 \right],$ with local client objectives $f_i$ and $\lambda$ tuning the personalization strength (Tan et al., 2021, Shi et al., 2022). Notable instantiations include pFedMe and Ditto, both of which empirically and theoretically interpolate between global and local extremes.

2.2 Meta-learning and Hypernetwork Approaches

Meta-learning-based FPL seeks initializations or learning-to-learn mechanisms that yield fast adaptation to client-specific distributions. Per-FedAvg adapts the MAML formulation: $\min_{\theta} \frac{1}{N} \sum_{i=1}^N f_i(\theta - \alpha \nabla f_i(\theta)),$ where the meta-objective minimizes loss after one (or a few) local adaptation steps (Fallah et al., 2020).

Hypernetwork-based methods further generalize this by learning a mapping $h_\varphi$ (the hypernetwork) such that each client receives parameters $\theta_i = h_\varphi(v_i)$ , with $v_i$ a learned descriptor possibly reflecting client attributes or found via permutation-invariant set encoding (Shamsian et al., 2021, Scott et al., 2023). These approaches, including pFedHN and PeFLL, support strong parameter sharing and zero-shot adaptation to unseen clients.

2.3 Model-Splitting and Aggregation Schemes

FPL architectures often decouple shared (global) and personalized (local) parameter blocks. Techniques such as FedPer and FedRep treat the model as a backbone plus a private head, communicating only the former (Tan et al., 2021, Zhang et al., 2023). Model-branching techniques like pFedMB assign each client a convex combination of $B$ subnetwork branches per layer, using learnable attention vectors $\alpha_{i}$ for aggregation: $W_{b,l}^{t+1} = \frac{\sum_{i=1}^N n_i \alpha_{i,b,l}^{t+1} W_{b,l}^{t+1,i}}{\sum_{i=1}^N n_i \alpha_{i,b,l}^{t+1}},$ enhancing communication efficiency and implicit client clustering (Mori et al., 2022).

2.4 Prompt-based and Manifold Personalization

For large-scale frozen models (e.g., foundation models), FPL methods such as pFedPT, FedPGP, and DP-FPL personalize via low-dimensional prompts added to data inputs or as context vectors to transformers: $p_i = p_G + U_i V_i + R_i,$ with global prompt $p_G$ , client-specific low-rank factors $U_i, V_i$ , and residuals $R_i$ . These methods enable fine-grained adaptation with minimal client communication and parameter overhead, supporting both vision-language tasks and strict privacy via local and global DP mechanisms (Li et al., 2023, Cui et al., 2024, Tran et al., 23 Jan 2025).

2.5 Adaptive and Interpretable Personalization

Advanced frameworks learn parameter-level participation degrees (e.g., via algorithm unrolling in Learn2pFed), client membership in canonical model mixtures (PPFL), or per-client hyperparameters for batch-norm and learning rate (FedL2P) (Lv et al., 2024, Di et al., 2023, Lee et al., 2023). These approaches provide both mathematical and interpretable adaptation, allowing each client to select the extent, mode, and region of model sharing.

3. Privacy-Preserving and Communication-Efficient FPL

FPL research incorporates privacy enhancements via differential privacy (DP), secure aggregation, and homomorphic encryption. PPMLFPL offers APPLE+DP and APPLE+HE, with clients solving local regularized objectives and sending either DP-noised or encrypted updates, ensuring $(\epsilon,\delta)$ -DP or cryptographic privacy at moderate computational cost: $\widetilde{\Delta w}_i^t = \bar{\Delta w}_i^t + \mathcal{N}(0,\sigma^2 C^2 I),$ where $\sigma$ and $C$ control the DP budget and update sensitivity (Hosain et al., 3 May 2025). Low-rank prompt personalization further minimizes the dimensionality of gradient releases, reducing noise-induced degradation under strict privacy (Tran et al., 23 Jan 2025).

Communication-efficient designs include multi-branch architectures (pFedMB), prompt- and backbone-splitting (pFedPT), and hypernetwork-driven parameter generation (PeFLL, pFedHN), each enabling compact, targeted updates tailored to the heterogeneity and resource profile of the deployment (Mori et al., 2022, Scott et al., 2023, Li et al., 2023).

4. Empirical Benchmarks, Performance Characterization, and Task Coverage

Comparative empirical studies and software platforms (e.g., PFLlib, FedBench) demonstrate that the efficacy of FPL methods is highly contingent on heterogeneity, data regime, and the nature of client partitioning (Matsuda et al., 2022, Zhang et al., 2023). Key empirical findings across diverse datasets (e.g., CIFAR-10/100, FEMNIST, DomainNet) include:

No single FPL method is universally best; straightforward FedAvg+fine-tuning often competes with or exceeds more sophisticated personalized methods in low-heterogeneity regimes.
Regularization- and meta-learning-based methods dominate under moderate heterogeneity, while model-splitting and low-rank/prompt methods excel under extreme non-IID splits or limited label overlap.
Hypernetwork and manifold approaches achieve superior generalization to unseen clients, especially in the low-data regime (Scott et al., 2023).
Privacy-preserving FPL with DP or HE can retain $>99\%$ accuracy with $\sim0.1\%$ drop under strong privacy, provided personalized adaptation is not excessively constrained (Hosain et al., 3 May 2025, Tran et al., 23 Jan 2025).
Adaptive architectures and parameter-selective sharing (Learn2pFed)—by learning participation degrees for each parameter—enable higher personalization fidelity and lower communication cost ( $\sim$ 10% of full parameter exchange) (Lv et al., 2024).

A summary of the main FPL algorithm classes and paradigm variants is provided below:

Approach	Personalization Mechanism	Typical Regime
pFedMe/Ditto	Moreau/reg. to global anchor	Moderate non-IID
Per-FedAvg	Meta-learned initialization	Few-shot/fast adapt.
pFedHN/PeFLL	Hypernetwork manifold; zero-shot gen.	High heterogeneity
pFedMB	Layer-wise multi-branching + attention	Unlabeled clusters
pFedPT/FedPGP	Prompt-based (visual/text/autoregressive)	Pretrained/frozen LM
APPLE+DP/+HE	DP/HE privacy on personalized updates	Cross-silo privacy
Learn2pFed	Per-param. participation via unrolled ADMM	Communication constr.

5. Theoretical Guarantees and Open Challenges

FPL admits rigorous convergence and generalization guarantees under standard smoothness, bounded variance, and heterogeneity assumptions. For example, PeFLL demonstrates $O(1/\sqrt{T})$ convergence in global objective gradient norm and proves a PAC-Bayesian bound for generalization to novel clients, with regularization strengths controlling meta- and client-level overfit (Scott et al., 2023). Theoretical analyses across proximal and meta-learning methods clarify the dependence of convergence/accuracy on data distribution distance measures (e.g., total variation or Wasserstein) (Fallah et al., 2020, Tan et al., 2021).

Key open challenges include:

Scalability of nonconvex optimization in deeply personalized models;
Adaptive selection of personalization modes per task or client resource profile;
Communication, computation, and privacy trade-offs under real-world bandwidth and trust limitations;
Theoretical analysis of convergence and generalization for advanced architectures (e.g., low-rank/prompt, hypernetworks) under compositional or mechanism-induced DP.

6. Future Directions and Landscape Overview

Research in federated personalized learning is rapidly expanding to cover complex modalities (audio, vision-language, recommendation), integrate principled privacy-preserving mechanisms, and support robust adaptation to client churn, dropouts, and adversarial behaviors. Long-term trajectories involve:

Formal convergence under mixed communication constraints and dynamic client populations (Wan et al., 1 Mar 2025, Tan et al., 2021);
Learnable or interpretable personalization structures reflecting heterogeneous population clusters, task relationships, or feature distributions (Di et al., 2023, Zhang et al., 2023);
Unified benchmarks and libraries (PFLlib) providing reproducible evaluation standards across modalities, non-IID types, and privacy regimes (Zhang et al., 2023, Matsuda et al., 2022);
Extensions to learn-to-learn paradigms and meta-meta-learning over federated networks, generalizing manifold and hypernetwork approaches (Scott et al., 2023, Shamsian et al., 2021).

FPL is establishing itself as a crucial enabler for privacy-aware, adaptable, and interpretable distributed learning in scenarios characterized by statistical and system diversity. The continuous synthesis of algorithmic innovation, theoretical insight, and empirical validation defines the state of the art and guides future challenges in this domain.