Partially-Personalized Aggregation

Updated 22 November 2025

Partially-personalized aggregation is a method that customizes collaborative models by mixing client-specific updates with shared global parameters.
It employs diverse strategies, such as similarity metrics and adaptive gradient-based weights, to balance local data specialization and global learning.
This approach enhances convergence speed, improves performance on non-IID data, and reduces communication overhead in decentralized environments.

Partially-personalized aggregation refers to a broad class of learning protocols in which collaborative models are constructed such that each participant (client, device, user, or peer) receives a distinct, client-specific aggregate, with this aggregate determined by an algorithmic mixture of client updates and global (shared) parameters. This concept arises prominently in federated learning (FL), recommendation systems, social choice theory, and decentralized online learning. The central objective is to realize an operational trade-off between fully personalized (local) models and fully global (pooled) models, providing customization to local statistics or preferences while leveraging statistical strength from relevant peers. Formally, the aggregation often takes the form $w_i = \sum_j a_{ij} w_j$ , where $\mathbf{a}_i$ is a client-specific weight vector computed via explicit metrics, data-driven optimization, learned gradients, or higher-level mechanisms.

1. Theoretical Formulation and Operator Splitting

The foundational framework for partially-personalized aggregation splits global optimization objectives into components governing collaborative and local adaptation. In the partially personalized FL paradigm, the joint objective is often

$\min_{\theta, \{w_i\}} \sum_{i=1}^n f_i(\theta, w_i)$

where $\theta$ is global/shared, and $w_i$ is private/local to client $i$ (Mishchenko et al., 2023). The solution concept seeks a $\theta^*$ and per-client minimizers $w_i^*(\theta^*)$ such that $F_i(\theta^*) = \nabla_1 f_i(\theta^*, w_i^*(\theta^*)) = 0$ for all $i$ . This setup enables clients to fit local data arbitrarily well (overpersonalization), while constraining $\theta^*$ to encode transferable representations.

Aggregation then enters as the mechanism to synchronize global variables—with instantiations ranging from simple averaging (FedAvg) to gradient-based, adaptive, or graph-based personalized weights. The double-loop “Fine-tuning Followed by Global Gradient" (FFGG) algorithm alternates local minimization in $w_i$ and global update in $\theta$ , with robust convergence guarantees under standard cocoercivity assumptions (Mishchenko et al., 2023).

2. Aggregation Strategies and Weight Construction

A wide array of aggregation rule parameterizations has emerged to instantiate partial personalization:

Weighted Model Mixing: Each client receives $w_i = \sum_j a_{ij} w_j$ , with $a_{ij}$ derived from data-size, similarity, or gradient metrics (Mestoukirdi et al., 2021, Sun et al., 11 Feb 2025). Pilot rounds may estimate pairwise data similarities (e.g., gradient distances) to initialize $a_{ij}$ ; subsequent adaptation can proceed via clustering, learning, or gradient descent.
Adaptive Gradient-based Weights: Server-side updates of $a_{ij}$ using the gradient of the local loss with respect to the aggregation weights—FedAPA directly traces $\nabla_{a_{ij}}L_i(w_i)$ using the parameter drift $\Delta w_i$ and backpropagation through aggregation (Sun et al., 11 Feb 2025).
Cluster-wise or Broadcast-Limited Aggregation: User-centric protocols organize clients into a reduced number $S \ll n$ of broadcast groups by clustering personalized aggregation vectors, limiting downlink streams but enabling strong personalization (Mestoukirdi et al., 2021).
Layer-wise/Module-wise Aggregation: pFedLA generalizes the rule to per-layer weighting, with a learned matrix $\alpha_i^{\ell,j}$ controlling, for each client $i$ and layer $\ell$ , how much to borrow from each client’s update (Ma et al., 2022).
Graph/Attention-driven Aggregation: FedAGHN constructs, for each client and layer, a star-shaped graph and assigns attention weights via trainable hypernetworks, adapting both the “peakedness” and “self-weight" for every client-layer pair (Song et al., 24 Jan 2025).
Similarity-based Partial Aggregation: pFedSim computes feature-extractor aggregates using a classifier-similarity kernel; only "similar" clients’ features are included per client (Tan et al., 2023).

These schemes commonly enforce that $\sum_j a_{ij} = 1, a_{ij}\geq 0$ , and regularize or adapt the mixings to maximize local or joint objectives.

3. Model Decoupling and Split Architectures

Partial personalization is operationally realized by decomposing network architectures:

Decoupled Feature Extractor and Head: Clients train a global backbone and personalize a decision head; only head updates are kept private, promoting feature sharing and decision specialization (Tan et al., 2023, Yin et al., 19 Oct 2024).
Autoencoder Partitioning: PersonalFR fixes a client-local encoder while aggregating only the output (decoder) layer globally; updates for items not rated by a client are never transmitted, reducing leakage (Le et al., 2022).
Prompt or Adapter Partitioning: Federated medical imaging systems assign global encoders but blend decoder parameters using parameter-wise, prompt-driven aggregation, with blending weights trained to minimize local task losses (Lin et al., 27 Feb 2024).
Embedding Table Partitioning in Recommender Systems: Composite aggregation (FedCA) and elastic merging (FedEM) blend local and global item embeddings at a fine granularity, using learned client- and even item-specific weights (Zhang et al., 6 Jun 2024, Chen et al., 17 Aug 2025).

The design of aggregation/splitting is highly task-specific, depending on the localization properties of different network components and the statistical structure of heterogeneity.

4. Optimization and Learning of Client-specific Aggregation Weights

Partially-personalized aggregation weight computation arises from varied mechanisms:

Heuristic or Static Metrics: Pilot-based estimation (e.g., data-size, proxy gradients, class statistics) followed by normalization (Mestoukirdi et al., 2021).
Optimization of Local Loss with Respect to Weights: Directly optimize the local empirical loss as a function of aggregation weights, using SGD or mini-batch approximations (dPFed, DA-PFL, decentralized online FL) (Wu et al., 2023, Yang et al., 14 Mar 2024).
Bayesian Optimization for Local-Global Weighting: In the DAMe framework, each client learns a scalar $\lambda_k$ for local-global interpolation via Bayesian optimization over held-out performance metrics, enabling rapid, data-driven adjustment (Yu et al., 1 Sep 2024).
Gradient-based Server-side Updates: Compute the derivative of local loss with respect to the aggregation vector on the server, leveraging the change in model parameters following local training (Sun et al., 11 Feb 2025).
Hypernetwork Generation of Weights: Use compact neural networks (hypernetworks) to generate per-client (or per-layer, per-client) weight matrices based on learnable client embeddings (Ma et al., 2022, Song et al., 24 Jan 2025).
Similarity, Complementarity, and Task Vector Approaches: Combine model similarity and data distribution complementarity (FedCA) (Zhang et al., 6 Jun 2024), or use client fine-tuning task vectors (FedBip) (Yang et al., 16 Sep 2025) as the basis for client-to-client weightings.

The optimization can be solved via quadratic programming, first-order methods, or even with closed-form updates in linear loss settings.

5. Empirical Trade-offs: Personalization, Efficiency, and Convergence

Extensive empirical studies highlight the effectiveness and trade-offs of partially-personalized aggregation:

Statistical Performance: Partially-personalized models consistently outperform both global (FedAvg) and purely local models on highly non-IID data, narrowing the generalization gap to centralized upper bounds (Mishchenko et al., 2023, Yin et al., 19 Oct 2024, Le et al., 2022).
Efficiency: Communication and computation costs are minimized by compressing aggregation (e.g., sending only active parameter slices), learning only a small number of aggregation weights, or adaptive parameter scheduling (multi-modal scenarios) (Le et al., 2022, Yin et al., 12 Jun 2024).
Convergence: Personalized aggregation accelerates convergence, mitigating client-drift and mode-mismatch instabilities observed in standard FL, and supports robust operation in asynchronous and Byzantine-robust settings (Mishchenko et al., 2023, Sun et al., 11 Feb 2025).
Robustness to Class/Data Imbalance: Complementary affinity-based weighting in DA-PFL counteracts local majority class imbalance, assigning higher weights to clients with complementary (rather than similar) class distributions (Yang et al., 14 Mar 2024).
Theoretical Guarantees: Structural conditions (cocoercivity, smoothness, strong convexity) underpin formal convergence proofs, establishing $O(1/R)$ or even linear rates under appropriate assumptions (Mishchenko et al., 2023, Sun et al., 11 Feb 2025).
Privacy: Several methods (e.g., PersonalFR, pFedSim) restrict transmission to subnetworks, parameter slices, or summary statistics, increasing resilience to privacy leakage (Le et al., 2022, Tan et al., 2023).
Recommender Ecosystems & Social Choice: In d'Hondt-based personalized recommendation, hybrid user/global vote-weighting preserves majority robustness while empowering minority-voice personalization, leading to increased click-through rates (Balcar et al., 11 Jun 2024).

Partially-personalized aggregation strategies have been extended beyond standard FL scenarios:

Multi-Modal Federated Learning: Aggregation weights are learned separately for each modality per client, optimizing both statistical and resource efficiency under communication constraints and modality-availability mismatch (Yin et al., 12 Jun 2024).
Decentralized & Online Settings: Clients in peer-to-peer or edge networks dynamically learn optimal peer weights via online optimization, supporting asynchronous, drifting, and non-stationary environments (Wu et al., 2023).
Social Choice and Electoral Theory: The aggregation of personalized signals—where agent-specific bias and attention generate unique posteriors—modulates collective decision quality, with non-trivial equilibrium and comparative-statics consequences (Li et al., 2020).

These extensions reinforce the generality of partial personalization as an organizing algorithmic principle across distributed, heterogeneous, and dynamically evolving learning environments.

7. Open Problems and Future Directions

While substantial progress has been realized, open directions are visible:

Scalability of Personalization: Learning, storing, and communicating $\mathcal{O}(n)$ distinct models or weights is challenging as $n$ increases. Stream/broadcast compression and clustering heuristics partially address this but leave open the limits of ultra-large-scale deployments (Mestoukirdi et al., 2021).
Adaptive Model Partitioning: Selecting the optimal (and possibly adaptive) split between local and global modules in deep architectures remains an area of active research; autoML-based approaches suggest promise.
Formal Generalization Guarantees: While convergence and worst-case risk bounds exist under various regularity conditions, precise generalization and fairness properties—especially in highly non-IID, dynamic-social, or privacy-constrained regimes—require further formalization.
Privacy Leakage Quantification: Quantitative analysis of cross-client information leakage, especially for complex aggregation logic (adaptive weights, hypernetworks, diffusion-based aggregates), is still nascent.

A plausible implication is that future research will further unify personalization mechanisms with adaptive communication, resource-aware parameter scheduling, adversarial robustness, and context-sensitive aggregation logic. The concept of partially-personalized aggregation is likely to remain central as federated, decentralized, and ecosystem-scale systems continue to proliferate across technical domains.