Client-Aware Aggregation in Federated Learning

Updated 9 February 2026

Client-aware aggregation strategy is an approach within federated learning that weights client updates based on data heterogeneity and class imbalances.
It employs dynamic adaptive focal loss and imbalance coefficients, achieving up to 87.19% accuracy and improved minority recall on medical imaging benchmarks.
The method ensures faster convergence and enhanced generalization by adapting to non-IID data distributions, thereby promoting fairness in sensitive applications.

A client-aware aggregation strategy is an approach within federated learning (FL) that dynamically assigns aggregation weights or adapts the global model update based on the statistical and distributional properties of individual client datasets. Such methods explicitly account for non-IID data distributions, class-imbalance, and heterogeneity across clients, moving beyond naive averaging to maximize generalization and fairness of the final model.

1. Motivation and Context for Client-Aware Aggregation

Federated learning orchestrates decentralized optimization across multiple clients (institutions, devices, or hospitals) without sharing raw data, often under significant class-imbalance and inter-client heterogeneity. Conventional schemes such as FedAvg aggregate client model updates via simple volume-weighted averaging, implicitly assuming homogeneity of data distributions and consistent representation of all classes. However, in many real-world applications—particularly medical imaging and healthcare—client datasets can differ both in size and in class distribution, with some entities holding predominantly minority or rare-category samples.

Under these settings, straightforward aggregation is often suboptimal. Frequent classes and clients with large data volume dominate the update, overwhelming rare events or minority-class gradients and degrading the global model’s generalizability. The issue is especially acute for sensitive applications (e.g., rare disease detection in federated medical imaging), motivating the need for client-aware aggregation schemes that adapt dynamically to both the composition and the contribution potentials of each client (Zhao et al., 2 Feb 2026).

2. Mathematical Formulation of Client-Aware Strategies

In the client-aware approach outlined in "Federated Vision Transformer with Adaptive Focal Loss for Medical Image Classification" (Zhao et al., 2 Feb 2026), clients train using a dynamic adaptive focal loss (DAFL) which encodes both data "hardness" (misclassification likelihood) and class rarity at both the client and global levels. The aggregation step then incorporates this heterogeneity into the global update through explicit weighting:

For $K$ clients, each client $k$ possesses a dataset $\mathcal{D}_k$ with class counts $\{n_{k,i}\}_{i=1}^C$ for $C$ classes. Two key imbalance coefficients are used:

a) Client-Level Imbalance Coefficient ( $c_k$ )

For client $k$ :

$c_k = \frac{1}{C} \sum_{i=1}^C \frac{N_k - n_{k,i}}{n_{k,i} + \epsilon}$

where $N_k$ is the total sample count and $\epsilon$ is a smoothing constant. This quantifies intra-client class skew.

b) Global, Class-Level Imbalance ( $c_{f, i}$ )

On the server, for each class $i$ :

$c_{f,i} = \frac{\sum_{k=1}^K N_k - \sum_{k=1}^K n_{k,i}}{\sum_{k=1}^K n_{k,i} + \epsilon}$

This measures global class scarcity.

The central client-aware weight per-sample for class $t$ is formed as a convex combination:

$c_{f, t} = \lambda c_k + (1-\lambda) c_{f, t}$

with $\lambda \in [0,1]$ traded off empirically.

During federated aggregation, this coefficient is leveraged so that clients with data skew or containing higher rarity classes receive larger effective weights, amplifying their minority-class gradients in the eventual model merge:

If $c_{f,t}$ is large (rare class or highly imbalanced client), its contribution is upweighted.
If $c_{f,t}\approx 0$ (well-represented class and balanced client), aggregation approximates classical averaging.

This mechanism is seamlessly integrated into the federated loss:

$\mathcal{L}_\text{DAFL}(p_t; c_{f, t}, \gamma) = - (1 + c_{f, t}) (1 - p_t)^\gamma \log(p_t)$

3. Implementation Workflow and Dynamic Adaptation

Client-aware aggregation operates iteratively within the FL communication loop:

Client-side: Each client recomputes $c_k$ from its local dataset per round.
Server-side: The server aggregates local class counts and broadcasts updated global class-imbalance coefficients $c_{f,i}$ .
Adaptive Loss Calculation: Each client uses the current $c_k$ and received $c_{f,i}$ to construct $c_{f, t}$ for per-example loss weighting.
Aggregation: Client models are aggregated at the server, with implicit (or explicit) weighting that reflects the dynamic client-aware coefficients.

The adaptation is round-wise, making the method responsive to client participation variability and evolving data distributions, ensuring persistent fairness and minority-class attention.

4. Empirical Evaluation and Impact

The client-aware aggregation strategy, coupled with DAFL, has been extensively validated on medical classification benchmarks (ISIC, Ocular Disease, RSNA-ICH) (Zhao et al., 2 Feb 2026). Key results include:

Dataset	Aggregation & Loss	Accuracy	F1 score	Minority Recall
ISIC	Cross-Entropy (CE)	74.31%	0.73	Lower, minority overwhelmed
ISIC	Standard focal loss	83.17%	0.82	Improved
ISIC	DAFL + client-aware	87.19%	0.83	Best, high minority recall
RSNA-ICH	DAFL + client-aware	83.45%	–	Top, stable convergence
Ocular Dis.	DAFL + client-aware	96.63%	–	Large improvement

Additionally, DAFL and client-aware aggregation together:

Outperform both traditional FL and competitive architectures (DenseNet121, ResNet50, ViT variants, MixNet, etc.).
Achieve faster convergence (fewer rounds to peak accuracy) and improved AUC (by 1–3 points).
Demonstrate greater stability in non-IID and severely imbalanced regimes.
Ensure minority class performance is sustained without sacrificing overall accuracy.

Ablation studies confirm that removing either the adaptive loss or the client-aware weighting degrades minority class performance and global generalization.

5. Distinctions from Non-Client-Aware Federated Schemes

Classical federated aggregation methods assign aggregation weights by the number of local samples, ignoring local imbalance and class dependencies. In contrast, client-aware methods, as formalized here, explicitly target adaptation to heterogeneous data by tracking intra- and inter-client imbalance and dynamically conveying these statistics both in local training and in aggregation.

This approach is orthogonal to many aggregation improvements in FL that focus on robustness to poisoning, communication reduction, or variance minimization. The client-aware mechanism directly alters the loss and gradient scaling, providing a fundamentally different axis of adaptation—especially relevant for medical and long-tailed real-world applications.

6. Hyperparameterization and Practical Guidelines

Critical hyperparameters in client-aware aggregation include the trade-off weight $\lambda$ (default 0.5), focal loss focusing parameter $\gamma$ (default 2, grid-searched in [1,4]), and smoothing $\epsilon$ (e.g., $10^{-6}$ ). Empirical optimization of $\lambda$ should align with the relative reliability of global versus local imbalance statistics:

Higher $\lambda$ : favor local (client-specific) corrections when clients are highly non-IID.
Lower $\lambda$ : leverage federation-wide signals for global rarity adjustment.

Dynamic computation and broadcast of imbalance coefficients incur negligible overhead relative to FL communication cost and can be integrated into diverse backbone architectures without altering network structure (Zhao et al., 2 Feb 2026).

7. Generalization, Limitations, and Future Perspectives

The client-aware aggregation paradigm is a drop-in extension to federated pipelines facing class-imbalance, non-IID partitions, or minority protection requirements. While thoroughly demonstrated in image classification with Vision Transformer backbones, these principles are readily applicable to segmentation, detection, and other tasks encountering similar cross-client heterogeneity.

A plausible implication is that extending client-aware aggregation to multi-modal, multi-task, or semi-supervised federated learning architectures could further enhance robustness and fairness, particularly when minority or rare-event detection is paramount. Its effectiveness, however, is inherently reliant on the reliability of client-side data statistics aggregation and requires careful privacy-preserving implementation.

For comprehensive comparisons and ablation analyses of the client-aware and dynamic adaptive focal loss framework, see (Zhao et al., 2 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Federated Vision Transformer with Adaptive Focal Loss for Medical Image Classification (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Client-Aware Aggregation Strategy.