Papers
Topics
Authors
Recent
Search
2000 character limit reached

HypeMeFed: Heterogeneous Federated Learning

Updated 26 April 2026
  • HypeMeFed is a federated learning framework that allows each client to train a neural network to the deepest level its resources support while synchronizing with a unified global model.
  • It integrates a multi-exit network architecture with hypernetwork-based, low-rank weight generation to effectively aggregate cross-client parameters under diverse system constraints.
  • Empirical evaluations show significant gains in accuracy, memory efficiency, and computational speed, making it robust for non-IID data and heterogeneous device environments.

HypeMeFed is a federated learning framework designed to address heterogeneity in client capabilities by enabling each client to train a neural network of the largest depth it can afford, while maintaining the coherence of a single global model. The framework fuses a multi-exit (early-exit) network architecture with hypernetwork-based, low-rank model weight generation to achieve effective cross-client parameter aggregation under diverse system constraints (Shin et al., 2024).

1. Multi-Exit Network Architecture

HypeMeFed utilizes a global neural network of LL layers, incorporating MM intermediate exit classifiers at depths d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L. The global parameter set is denoted as ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}. For client cc with capability mm (m∈{1,…,M}m \in \{1, \ldots, M\}), the server transmits only the first dmd_m layers along with their associated exits, i.e., Θc={W1,…,Wdm}∪{exit heads 1,…,m}\Theta^c = \{W_1, \ldots, W_{d_m}\} \cup \{\text{exit heads}\ 1, \ldots, m\}.

During local training, each client computes outputs for all assigned exits and minimizes the joint loss: Lc(Θc)=∑j=1mλj ℓ(fj(x;W1:dj),y)\mathcal{L}^c(\Theta^c) = \sum_{j=1}^{m} \lambda_j\, \ell(f_j(x; W_{1:d_j}), y) where MM0 is a loss function such as cross-entropy. Exit classifiers at multiple depths ensure that every subnetwork learns globally meaningful features, aligning representation spaces across clients of differing depths. Clients return updates MM1 for all MM2.

2. Hypernetwork-Based Weight Generation

To resolve "information disparity"—the uneven training of deeper layers due to only a subset of clients updating them—HypeMeFed introduces a server-side hypernetwork MM3. This hypernetwork predicts missing weight blocks for layers underrepresented in client updates. The mapping is learned from earlier-layer weights to later-layer weights: MM4 where MM5 parameterizes the hypernetwork and MM6 denotes vectorization. For a client MM7 whose last trained layer is MM8, its full model weights are: MM9 This approach allows even shallow clients to be mapped into deeper feature spaces, providing a common representational basis and preventing "starvation" of information in later layers.

3. Low-Rank Factorization for Hypernetworks

Directly training hypernetworks to map flattened full-layer weights is computationally intractable due to parameter size. HypeMeFed applies a low-rank factorization (LRF) to each convolutional layer matrix d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L0: d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L1 with d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L2, d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L3, d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L4. Only the top d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L5 singular components are retained. The hypernetwork then predicts d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L6 and d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L7 in lieu of d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L8 directly. Empirically, for d1<d2<⋯<dM=Ld_1 < d_2 < \cdots < d_M = L9, the hypernetwork parameter count is reduced by more than 99.3%, memory usage by 98.7%, and per-epoch server training time achieves a ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}02.5ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}1 speedup, with accuracy loss under 1.6% (Shin et al., 2024).

4. Federated Optimization Protocol

Each federated round ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}2 operates as follows:

  • The server maintains ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}3 and dispatches ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}4 to client ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}5.
  • Each client ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}6 performs ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}7 epochs of SGD on local data to obtain updated weights, returning ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}8.
  • For each layer ΘG={W1,W2,…,WL}\Theta^G = \{W_1, W_2, \ldots, W_L\}9, let cc0. If cc1 is sufficiently large, standard averaging is performed:

cc2

If cc3 is too small, the hypernetwork regenerates the update:

cc4

  • The server updates hypernetwork parameters cc5 by minimizing:

cc6

This protocol maintains the standard communication pattern and mitigates the under-aggregation of deep layers typical in heterogeneous federated settings.

5. Empirical Performance and Evaluation

HypeMeFed has been evaluated on SVHN, STL-10, and UniMiB-SHAR using a VGG-style CNN with exits after convolutional blocks 1, 2, and 4. Fifty clients are partitioned into 17 small, 17 medium, and 16 full-capability groups. Data is distributed non-IID via a Dirichlet(cc7) split, with 20% client sampling per round.

Key findings:

  • Accuracy: Improves over FedAvg-Small by 5.12% on a real device testbed and approaches the upper bound of FedAvg-Large.
  • Hypernetwork memory: Reduced by 98.22% (455 MB to 5.8 MB on UniMiB) at cc8.
  • Speed: Hypernetwork training time reduced from cc9226 ms/epoch to mm092 ms/epoch (2.46mm1 faster), with server overhead per round falling from mm23.9 s to mm32.1 s (1.86mm4).
  • Testbed: Deployment on 12 heterogeneous devices (Raspberry Pi 4, Jetson Nano, TX2) and an RTX 3090 server demonstrated per-round latency balancing (14 s for Pi full model vs. 4 s for Pi small), with a 5.13% accuracy boost compared to small-only FedAvg while keeping rounds short (Shin et al., 2024).

6. Robustness, Deployment, and Limitations

HypeMeFed maintains performance across non-IID strengths mm5 and remains stable even under unbalanced client type distributions. Hypernetwork computation remains server-side and is rendered lightweight by LRF. Batch normalization and classifier layers are aggregated directly rather than regenerated, preserving client normalization and facilitating personalization.

Identified limitations include:

  • Current support is focused on CNNs; extension to RNNs, Transformers, or deeper multi-exit splits is unaddressed.
  • Full heterogeneity support could benefit from seamless integration with pruning, quantization, or alternative personalization strategies.
  • Further hypernetwork architectural specialization and efficient training algorithms remain open directions.

7. Context and Prospective Directions

HypeMeFed offers a mathematically principled and practically efficient methodology for federated learning under heterogeneous resource conditions. It combines (a) model slicing via early exits, (b) weight generation with LRF-compressed hypernetworks, and (c) FedAvg-compatible aggregation. Empirically, the approach demonstrates nontrivial accuracy gains, memory and computational savings, and robust operation across device and data distributions.

Anticipated future avenues include layer-wise hypernetwork specialization, support for alternative network families, deeper early-exit splits, and coupling with advanced personalization techniques. These developments aim to extend the applicability and efficiency of federated learning in increasingly heterogeneous environments (Shin et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HypeMeFed.