Layer-wise Personalized Aggregation (pFedLA)

Updated 1 July 2026

The paper introduces a novel FL framework that learns trainable, layer-wise aggregation weights to overcome the limitations of uniform parameter averaging in non-IID settings.
It employs per-client hypernetworks to dynamically mix parameters from all clients, leading to enhanced generalization, accuracy, and communication efficiency.
Experimental results on benchmarks like CIFAR-10 demonstrate significant accuracy improvements, highlighting the framework's practical benefits in heterogeneous environments.

Layer-wise Personalized Aggregation (pFedLA) is a federated learning (FL) framework that enables client-level model personalization through explicit, trainable layer-wise aggregation. Conventional FL approaches such as FedAvg treat all clients' model parameters as equally aggregable at the whole-model level, which is suboptimal under non-IID (heterogeneous) data distributions. pFedLA addresses this limitation by learning, for each client and for each layer, how much to mix parameters from the entire client set. This paradigm advances the robustness and flexibility of federated personalization, especially under distributional shifts and user heterogeneity (Ma et al., 2022).

1. Problem Formulation and Conceptual Motivation

pFedLA considers a federated network of $N$ clients, each owning local data $\mathcal{D}_i$ sampled from distinct, potentially non-overlapping distributions. Each client $i$ maintains a model with parameters $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ across $L$ layers.

The classical FL objective,

$\min_{\theta} \sum_{i=1}^N \frac{m_i}{M} \mathcal{L}_i(\theta),$

where $m_i = |\mathcal{D}_i|$ , assumes a single global parameterization $\theta$ . This is often inadequate for non-IID federated regimes. pFedLA instead pursues personalized model parameters $\bar\theta_i$ for each client, obtained by learning how to aggregate peers' parameters with different weights at each layer.

The driving hypothesis is that both statistical drift (distribution mismatch) and functionality transferability vary across layers and across clients. Thus, individualized, layer-wise aggregation leads to improved generalization and adaptation for all participants.

2. Hypernetwork-based Layer-wise Aggregation

At the core of pFedLA is a per-client hypernetwork on the server. For each client $i$ , the hypernetwork $\mathcal{D}_i$ 0 receives an embedding $\mathcal{D}_i$ 1 and outputs an aggregation weight matrix $\mathcal{D}_i$ 2 such that: $\mathcal{D}_i$ 3

Each entry $\mathcal{D}_i$ 4 quantifies the extent to which client $\mathcal{D}_i$ 5's personalized parameter at layer $\mathcal{D}_i$ 6 should incorporate the corresponding parameter from client $\mathcal{D}_i$ 7.

Given all clients' latest model parameters $\mathcal{D}_i$ 8 at round $\mathcal{D}_i$ 9, the next personalized model for client $i$ 0 is built, layer by layer: $i$ 1 for $i$ 2.

This layer-centric, client-specific weighting mechanism stands in contrast to prior whole-model aggregation or static partial federated approaches.

3. Hierarchical Optimization and Update Mechanism

The hypernetwork's outputs (the per-layer, per-client aggregation weights) are meta-learned to minimize the client-specific local objectives after each FL communication round. The hypernetwork parameters $i$ 3, unique to each client $i$ 4, are updated in response to the improvement achieved through local optimization.

After model assembly and local SGD on personalized $i$ 5, the client returns the parameter update $i$ 6 to the server. The server employs chain-rule computations to backpropagate through the aggregation map: $i$ 7 and performs gradient-based updates of $i$ 8. This aligns the aggregation policy with local progress, adaptively tuning the hypernetwork to exploit the observed inter-client similarities.

4. Training Procedure

The high-level flow of pFedLA is as follows:

Initialization: Each client starts with shared (or random) model parameters, personalized hypernetwork weights $i$ 9, and embeddings $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 0.
Layer-wise Personalization: For each client $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 1, the server constructs $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 2 via the current $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 3 and broadcasts it.
Local Update: Each client performs local SGD using $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 4 as initialization, yielding updated parameters $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 5.
Communication: Clients return their update $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 6.
Server update: The server updates "global" layer parameters (typically by averaging $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 7 across $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 8) and refines $\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]$ 9 for each client via the gradients above.
Repeat for the designated number of rounds.

The last aggregation yields the final personalized model for each client: $L$ 0 after $L$ 1 rounds.

5. Relation to Alternative Layer-wise and Partial Federation Approaches

The pFedLA approach is one instantiation of a growing class of layer-wise personalized FL strategies, with notable points of comparison:

Federation Sensitivity / Principled Partial FL: PLayer-FL uses first-order pruning-based federation sensitivity metrics to select a transition point between federated and local layers in an architecture- and task-adaptive manner (Elhussein et al., 12 Feb 2025).
Gradient Conflict-based Layer Aggregation: FedLAG dynamically personalizes or aggregates layers by detecting client gradient conflicts via cosine similarity at each layer, assigning conflicting layers to remain local (Nguyen et al., 2024).
Adaptive Layer-wise Update and Masking: FLAYER combines selective head aggregation, per-layer learning rates, and masking-adaptive uploads, but aggregates in fixed regions (e.g., base vs. head) with upload budgets (Chen et al., 2024).

The main innovation of pFedLA is the continuous, differentiable (softmax-weighted), and meta-learned layer-wise mixture—per client—across all layers and peers. This unifies and generalizes the binary layer-splitting, fixed partial, and data-driven heuristic approaches.

6. Experimental Evaluation and Communication Efficiency

pFedLA has been validated on canonical FL benchmarks (EMNIST, FashionMNIST, CIFAR-10, CIFAR-100) with heterogeneous data splits (non-IID class-skew, imbalanced class frequencies), using small CNN architectures. Baselines include both classical FL (FedAvg, pFedMe, FedBN, FedRep) and hypernetwork-based pFL (pFedHN, FedFomo).

Key findings:

On CIFAR-10, pFedLA increases average test accuracy from approximately 59% (FedAvg) to 61.4% (pFedLA) for 10 clients, and from roughly 58% (FedAvg) to 73% (pFedLA) for 100 clients.
Personalized hypernetworks yield consistently higher average accuracy than global FL, meta-learning, or fixed-partial models.
A communication-efficient variant (HeurpFedLA) uses sparsification over the learned $L$ 2 matrices to transmit only the least-personalized (“most shared”) layers, reducing bandwidth by 30–40% while sacrificing less than 1% accuracy.

These results demonstrate that fine-grained, layer-specific aggregation is critical for both accuracy and efficiency in personalized non-IID settings (Ma et al., 2022).

7. Theoretical Perspective and Empirical Implications

pFedLA's effectiveness is derived empirically rather than from formal convergence guarantees. The foundational insight is that client-specific, layer-wise aggregation is capable of suppressing statistical drift and accelerating the alignment between local and federated objectives—a phenomenon corroborated by convergence and personalization analyses in related work (e.g., FedLAG's theoretical bounds capture O(1/(RE)) rates with a personalization gain term (Nguyen et al., 2024)).

A plausible implication is that hypernetwork-driven mixing coefficients emulate a data-driven spectrum between ‘global’ and ‘fully local’ layers, learning to exploit inter-client commonality where available, and to specialize otherwise. This suggests a unifying principle for future personalized federated learning frameworks: fine-grained, data-adaptive, and continuous personalization schemes can outperform rigid heuristics, especially under high heterogeneity.

References:

"Layer-wised Model Aggregation for Personalized Federated Learning" (Ma et al., 2022)
"Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients" (Nguyen et al., 2024)
"Optimizing Personalized Federated Learning through Adaptive Layer-Wise Learning" (Chen et al., 2024)
"PLayer-FL: A Principled Approach to Personalized Layer-wise Cross-Silo Federated Learning" (Elhussein et al., 12 Feb 2025)