Papers
Topics
Authors
Recent
Search
2000 character limit reached

Layer-wise Personalized Aggregation (pFedLA)

Updated 1 July 2026
  • The paper introduces a novel FL framework that learns trainable, layer-wise aggregation weights to overcome the limitations of uniform parameter averaging in non-IID settings.
  • It employs per-client hypernetworks to dynamically mix parameters from all clients, leading to enhanced generalization, accuracy, and communication efficiency.
  • Experimental results on benchmarks like CIFAR-10 demonstrate significant accuracy improvements, highlighting the framework's practical benefits in heterogeneous environments.

Layer-wise Personalized Aggregation (pFedLA) is a federated learning (FL) framework that enables client-level model personalization through explicit, trainable layer-wise aggregation. Conventional FL approaches such as FedAvg treat all clients' model parameters as equally aggregable at the whole-model level, which is suboptimal under non-IID (heterogeneous) data distributions. pFedLA addresses this limitation by learning, for each client and for each layer, how much to mix parameters from the entire client set. This paradigm advances the robustness and flexibility of federated personalization, especially under distributional shifts and user heterogeneity (Ma et al., 2022).

1. Problem Formulation and Conceptual Motivation

pFedLA considers a federated network of NN clients, each owning local data Di\mathcal{D}_i sampled from distinct, potentially non-overlapping distributions. Each client ii maintains a model with parameters θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}] across LL layers.

The classical FL objective,

minθi=1NmiMLi(θ),\min_{\theta} \sum_{i=1}^N \frac{m_i}{M} \mathcal{L}_i(\theta),

where mi=Dim_i = |\mathcal{D}_i|, assumes a single global parameterization θ\theta. This is often inadequate for non-IID federated regimes. pFedLA instead pursues personalized model parameters θˉi\bar\theta_i for each client, obtained by learning how to aggregate peers' parameters with different weights at each layer.

The driving hypothesis is that both statistical drift (distribution mismatch) and functionality transferability vary across layers and across clients. Thus, individualized, layer-wise aggregation leads to improved generalization and adaptation for all participants.

2. Hypernetwork-based Layer-wise Aggregation

At the core of pFedLA is a per-client hypernetwork on the server. For each client ii, the hypernetwork Di\mathcal{D}_i0 receives an embedding Di\mathcal{D}_i1 and outputs an aggregation weight matrix Di\mathcal{D}_i2 such that: Di\mathcal{D}_i3

Each entry Di\mathcal{D}_i4 quantifies the extent to which client Di\mathcal{D}_i5's personalized parameter at layer Di\mathcal{D}_i6 should incorporate the corresponding parameter from client Di\mathcal{D}_i7.

Given all clients' latest model parameters Di\mathcal{D}_i8 at round Di\mathcal{D}_i9, the next personalized model for client ii0 is built, layer by layer: ii1 for ii2.

This layer-centric, client-specific weighting mechanism stands in contrast to prior whole-model aggregation or static partial federated approaches.

3. Hierarchical Optimization and Update Mechanism

The hypernetwork's outputs (the per-layer, per-client aggregation weights) are meta-learned to minimize the client-specific local objectives after each FL communication round. The hypernetwork parameters ii3, unique to each client ii4, are updated in response to the improvement achieved through local optimization.

After model assembly and local SGD on personalized ii5, the client returns the parameter update ii6 to the server. The server employs chain-rule computations to backpropagate through the aggregation map: ii7 and performs gradient-based updates of ii8. This aligns the aggregation policy with local progress, adaptively tuning the hypernetwork to exploit the observed inter-client similarities.

4. Training Procedure

The high-level flow of pFedLA is as follows:

  1. Initialization: Each client starts with shared (or random) model parameters, personalized hypernetwork weights ii9, and embeddings θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]0.
  2. Layer-wise Personalization: For each client θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]1, the server constructs θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]2 via the current θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]3 and broadcasts it.
  3. Local Update: Each client performs local SGD using θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]4 as initialization, yielding updated parameters θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]5.
  4. Communication: Clients return their update θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]6.
  5. Server update: The server updates "global" layer parameters (typically by averaging θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]7 across θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]8) and refines θi=[θi(1),,θi(L)]\theta_i = [\theta_i^{(1)}, \dots, \theta_i^{(L)}]9 for each client via the gradients above.
  6. Repeat for the designated number of rounds.

The last aggregation yields the final personalized model for each client: LL0 after LL1 rounds.

5. Relation to Alternative Layer-wise and Partial Federation Approaches

The pFedLA approach is one instantiation of a growing class of layer-wise personalized FL strategies, with notable points of comparison:

  • Federation Sensitivity / Principled Partial FL: PLayer-FL uses first-order pruning-based federation sensitivity metrics to select a transition point between federated and local layers in an architecture- and task-adaptive manner (Elhussein et al., 12 Feb 2025).
  • Gradient Conflict-based Layer Aggregation: FedLAG dynamically personalizes or aggregates layers by detecting client gradient conflicts via cosine similarity at each layer, assigning conflicting layers to remain local (Nguyen et al., 2024).
  • Adaptive Layer-wise Update and Masking: FLAYER combines selective head aggregation, per-layer learning rates, and masking-adaptive uploads, but aggregates in fixed regions (e.g., base vs. head) with upload budgets (Chen et al., 2024).

The main innovation of pFedLA is the continuous, differentiable (softmax-weighted), and meta-learned layer-wise mixture—per client—across all layers and peers. This unifies and generalizes the binary layer-splitting, fixed partial, and data-driven heuristic approaches.

6. Experimental Evaluation and Communication Efficiency

pFedLA has been validated on canonical FL benchmarks (EMNIST, FashionMNIST, CIFAR-10, CIFAR-100) with heterogeneous data splits (non-IID class-skew, imbalanced class frequencies), using small CNN architectures. Baselines include both classical FL (FedAvg, pFedMe, FedBN, FedRep) and hypernetwork-based pFL (pFedHN, FedFomo).

Key findings:

  • On CIFAR-10, pFedLA increases average test accuracy from approximately 59% (FedAvg) to 61.4% (pFedLA) for 10 clients, and from roughly 58% (FedAvg) to 73% (pFedLA) for 100 clients.
  • Personalized hypernetworks yield consistently higher average accuracy than global FL, meta-learning, or fixed-partial models.
  • A communication-efficient variant (HeurpFedLA) uses sparsification over the learned LL2 matrices to transmit only the least-personalized (“most shared”) layers, reducing bandwidth by 30–40% while sacrificing less than 1% accuracy.

These results demonstrate that fine-grained, layer-specific aggregation is critical for both accuracy and efficiency in personalized non-IID settings (Ma et al., 2022).

7. Theoretical Perspective and Empirical Implications

pFedLA's effectiveness is derived empirically rather than from formal convergence guarantees. The foundational insight is that client-specific, layer-wise aggregation is capable of suppressing statistical drift and accelerating the alignment between local and federated objectives—a phenomenon corroborated by convergence and personalization analyses in related work (e.g., FedLAG's theoretical bounds capture O(1/(RE)) rates with a personalization gain term (Nguyen et al., 2024)).

A plausible implication is that hypernetwork-driven mixing coefficients emulate a data-driven spectrum between ‘global’ and ‘fully local’ layers, learning to exploit inter-client commonality where available, and to specialize otherwise. This suggests a unifying principle for future personalized federated learning frameworks: fine-grained, data-adaptive, and continuous personalization schemes can outperform rigid heuristics, especially under high heterogeneity.


References:

  • "Layer-wised Model Aggregation for Personalized Federated Learning" (Ma et al., 2022)
  • "Towards Layer-Wise Personalized Federated Learning: Adaptive Layer Disentanglement via Conflicting Gradients" (Nguyen et al., 2024)
  • "Optimizing Personalized Federated Learning through Adaptive Layer-Wise Learning" (Chen et al., 2024)
  • "PLayer-FL: A Principled Approach to Personalized Layer-wise Cross-Silo Federated Learning" (Elhussein et al., 12 Feb 2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Layer-wise Personalized Aggregation (pFedLA).