Heterogeneous Federated Learning

Updated 4 February 2026

Heterogeneous Federated Learning is a framework that addresses diverse data, model, task, device, and communication challenges in collaborative, privacy-preserving training.
It uses split architectures, knowledge distillation, and adaptive aggregation to efficiently handle statistical and system heterogeneity.
HFL is applied in healthcare, autonomous driving, and edge computing, improving efficiency and personalization while ensuring data privacy.

Heterogeneous Federated Learning (HFL) encompasses a broad class of federated optimization frameworks that enable collaborative, privacy-preserving model training across clients exhibiting diversity in data distributions, feature spaces, model architectures, computational/communication resources, and learning tasks. While classical horizontal federated learning (HFL) assumes homogeneous clients (i.e., shared feature and label spaces, and identical model architectures), real-world deployments routinely encounter forms of heterogeneity that break these assumptions. A core challenge in HFL is to accommodate and exploit this diversity without breaching data privacy or incurring prohibitive communication/computation costs.

1. Forms of Heterogeneity in Federated Learning

Heterogeneity in federated settings can be formally categorized along five orthogonal axes (Ye et al., 2023, Chen et al., 2024, Gao et al., 2022):

Data heterogeneity (statistical non-IID): The data distribution $P_k(x,y)$ varies across client $k$ , manifesting as label/skew, feature skew, quality skew (varying label noise levels $\eta_k$ ), or quantity skew ( $N_i \gg N_j$ ). Quantitative measures include KL divergence, earth mover’s distance $W(P_i,P_j)$ , and variance of client gradients.
Model heterogeneity: Clients may employ models with different structures, parameterizations, and optimizers: parameter sets $\mathcal W_k$ with $\dim W_k^{(\ell)} \neq \dim W_j^{(\ell)}$ . This occurs in cases from mobile AI to cross-organization silos.
Task heterogeneity: Each client $i$ may have a different objective (e.g., different output spaces $\mathcal Y_i$ or tasks/regression vs. classification), with task relatedness quantified by parameters like $\|\theta^*_i-\theta^*_j\|$ .
Device heterogeneity: Compute, memory, and energy budgets ( $C_i$ , $M_i$ , $E_i$ ), varying local epoch limits, and capacities lead to irregular local progress and stragglers.
Communication heterogeneity: Uplink/downlink bandwidth $b_i$ , latency $\tau_i$ , and connectivity are highly non-uniform; round time cost per client is $T^{(comm)}_i = \tau_i + |\theta|/b_i$ .

HFL is thus defined as the collaborative minimization problem where constraints and loss functions are allowed to differ per client: $\min_{\{\theta_k\}, \theta_G}\; \sum_{k=1}^K p_k F_k(\theta_k) + \sum_{k=1}^K \frac{\lambda_k}{2}\|\theta_k-\theta_G\|^2$ subject to aggregation, system, and communication constraints (Ye et al., 2023, Gao et al., 2022).

2. Architectural and Algorithmic Frameworks

2.1 Horizontal HFL with Feature and Model Heterogeneity

Classic horizontal federated learning (FedAvg) requires a shared feature space ( $\mathcal X_k = \mathcal X$ for all $k$ ); in practice, only a subset of features may be shared (Mori et al., 2022). Heterogeneous HFL adapts via split architectures:

Two-column splits (CHFL): Separate network columns for common features ( $F_{\theta_c}$ ) and client-unique features ( $G_{\theta_{u,k}}$ ), with lateral connections to transfer knowledge. FedAvg applies only to the common column; unique columns are updated locally, linked by trainable matrices $U_k^{(i)}$ without cross-client update (Mori et al., 2022).
Layer/subnetwork partitioning: Partial parameter sharing enables devices with different compute capabilities to train submodels of a global “supernet” (e.g., FedHeN’s coupling constraint: $w_s = [w_c]_A$ ) (Acar et al., 2022).

2.2 Knowledge Distillation and Prototype-Based Methods

Mutual and global knowledge distillation: When model architectures differ, direct parameter averaging is impossible. Solutions (FedMD, HierarchyFL, FedH2L, FedProtoKD) exchange “soft” output distributions (probabilities/logits) or class prototypes, using KL-divergence or MSE objectives to perform model-agnostic aggregation (Xia et al., 2022, Li et al., 2021, Hossen et al., 26 Aug 2025).
Prototype sharing: Instead of full model weights, clients share low-dimensional class prototypes (average feature embeddings per class), with server-side adaptation (e.g., adaptive class-wise margin maximization, contrastive prototype loss) (Hossen et al., 26 Aug 2025, Hangdong et al., 2023).
Hierarchical self-distillation: Server builds a hierarchy of submodels (with different widths or blocks) and performs ensemble meta-distillation on a small held-out, possibly synthetic, public dataset (Xia et al., 2022).

2.3 Communication and Scheduling Under Resource Heterogeneity

Adaptive client selection and load balancing: Based on periodic profiling (CPU/GPU, bandwidth, latency, success rates), rounds select a subset of clients using history-weighted score functions to maximize efficiency and fairness (Ghimire et al., 22 Nov 2025, Hussain et al., 4 Jun 2025).
Partial-model aggregation and quantization: Clients may upload only sparse or quantized gradients/send only subnetworks (federated dropout), dramatically reducing bandwidth and enabling low-end device participation (Ghimire et al., 22 Nov 2025, Hussain et al., 4 Jun 2025).
Decentralized and asynchronous updating: Schemes such as HADFL remove central coordination, allowing asynchronous aggregation (partial rings), probabilistic device selection, and per-device local step adaptation to sidestep stragglers (Cao et al., 2021).
Hierarchical FL (cloud-fog-edge): Multi-tier aggregation with heterogeneous edge servers and vehicles, with aggregation weights set by statistical similarity (e.g., Bhattacharyya distance between data distributions) for accurate, fast convergence (Kou et al., 2024).

3. Personalized and Privacy-Preserving HFL

Personalized federated optimization: Each client optimizes a local loss with a regularization/proximal term to the global model, e.g., $F_i(\theta) + \frac{\mu}{2}\|\theta-\theta_{global}\|^2$ (FedProx) (Chen et al., 2024, Ye et al., 2023).
Privacy enhancements: Embedding strategies include local differential privacy (adding noise to prototypes, embedding vectors, or model updates), functional/multi-party encryption (Paillier, secret sharing for secure aggregation), and digital watermarking or blockchain-based model tracing (Chen et al., 2024, Hangdong et al., 2023).
Data- and task-level privacy: Application of different privacy budgets and noise levels depending on data quality or device resources, ensuring uniform privacy guarantees across the heterogeneous federation (Hussain et al., 4 Jun 2025).

4. Specialized Applications and Real-World Impact

Healthcare: HFL enables cross-hospital model collaboration despite feature-space and data sparsity heterogeneity, using modular network heads and asynchronous knowledge transfer to achieve up to 94.8% MSE reduction in clinical prediction (Syu et al., 21 Jan 2025).
Autonomous driving and mobile computing: Hierarchical, quality-aware HFL frameworks (FedGau, QA-HFL) manage extreme device heterogeneity and non-IID regional data, leveraging statistical aggregation and adaptive communication to improve mIoU and reduce communication by ~30% (Kou et al., 2024, Hussain et al., 4 Jun 2025).
Network traffic classification and LLMs: Custom federated adaptation of LLMs (e.g., HFL-FlowLLM) combines LoRA adapters and customized aggregation for high-throughput, low-cost traffic flow classification, achieving +13% F1 over strong HFL baselines and reducing training costs by 87% (Tian et al., 18 Nov 2025).
Edge-AI/AIoT: Decentralized HFL achieves both high end-to-end speed (up to 4.68× wall-clock speedup) and minimal accuracy drop in multi-device, resource-heterogeneous clusters (Cao et al., 2021).

5. Theoretical Guarantees and Open Problems

Convergence analysis: HFL frameworks provide convergence guarantees under non-IID, heterogeneous, and bilevel constraints, with nonasymptotic $O(1/\sqrt{R})$ rates and robustness to straggling/partial participation (e.g., ZO-HFL relaxes bounded-gradient-dissimilarity) (Qiu et al., 2 Apr 2025).
Prototype margin theory: Adaptive prototype margin maximization provably prevents collapse under extreme non-IID/heterogeneous architectures (Hossen et al., 26 Aug 2025).
Limitations and research challenges:
- Difficulty of reliably discovering common subspaces for mixed-feature-space HFL.
- Performance/communication tradeoffs under escalating heterogeneity (statistical, system, task, and privacy).
- Increased risk of backdoor attacks when synthetic public data (generated by large foundation models) is used for distillation, which undermines existing FL defense strategies (Li et al., 2023).
- Need for integrated benchmarks and metrics jointly evaluating accuracy, communication efficiency, robustness, fairness, and privacy (Ye et al., 2023, Gao et al., 2022).

6. Taxonomy and Future Directions

Heterogeneity Axis	Methods/Toolkits	Key Challenges
Data/statistical	FedProx, FedMask, CHAFL, FedGau	Distribution, label/feature skew
Model	FedH2L, FedMD, HierarchyFL, FedProtoKD	Parameter mismatch, personalization
Task	Multi-task FL, MOCHA, meta-FL	Cross-task transfer, loss fusion
Device	Adaptive scheduling, QA-HFL	Stragglers, resource scaling
Communication	Quantization, dropout, partial/fault tolerance	Bandwidth, latency heterogeneity

A unified HFL solution requires tight co-design of aggregation, personalization, knowledge transfer, privacy, and adaptive communication protocols. Promising directions include adaptive aggregation rules that reflect multi-dimensional resource/quality/fairness constraints, robust and FM-aware privacy/attack defenses, and continual learning strategies that support streaming feature or client arrival.

References:

(Mori et al., 2022, Ye et al., 2023, Gao et al., 2022, Ghimire et al., 22 Nov 2025, Chen et al., 2024, Cao et al., 2021, Xia et al., 2022, Hossen et al., 26 Aug 2025, Hangdong et al., 2023, Xia et al., 2022, Li et al., 2021, Acar et al., 2022, Kou et al., 2024, Syu et al., 21 Jan 2025, Hussain et al., 4 Jun 2025, Tian et al., 18 Nov 2025, Qiu et al., 2 Apr 2025, Yu et al., 2024, Litany et al., 2022, Nguyen et al., 2022, Li et al., 2023)