Papers
Topics
Authors
Recent
Search
2000 character limit reached

SuperSFL: Heterogeneous Federated Split Learning

Updated 12 January 2026
  • SuperSFL is a federated split learning framework that uses weight-sharing super-networks and resource-aware subnetwork generation to handle device heterogeneity.
  • It introduces a Three-Phase Gradient Fusion mechanism that optimizes model updates by combining client and server contributions under varying network conditions.
  • Fault-tolerant aggregation and performance-weighted updates ensure robust training and efficiency in dynamic, resource-diverse edge environments.

SuperSFL refers to “Resource-Heterogeneous Federated Split Learning with Weight-Sharing Super-Networks,” a framework that addresses the challenge of efficient, robust training in federated edge environments composed of clients with highly diverse computational and network capabilities. Building on existing SplitFed Learning paradigms, SuperSFL introduces a weight-sharing super-network architecture, a Three-Phase Gradient Fusion (TPGF) optimization mechanism, and fault-tolerant aggregation for resilient, high-throughput distributed learning (Asif et al., 5 Jan 2026).

1. Background: SplitFed Learning and Device Heterogeneity

SplitFed Learning (SFL) integrates the architectural principles of Federated Learning (FL) and Split Learning (SL). In SFL, the global model is partitioned at a designated "cut" point such that each client operates a shallow encoder (hosting the initial model layers), while a centralized server hosts the deeper layers (decoder). Clients execute forward passes up to the cut point, transmit "smashed data" activations to the server, which completes the forward and backward computation, and returns the cut-layer gradients. This structure reduces per-client computation and bandwidth demands compared to FL and SL individually.

A major limitation in practical deployments is device heterogeneity, where client resources (CPU/GPU, memory, and network latency) vary significantly. Prior SFL approaches assume a uniform client split depth, which is either computationally excessive for weak devices or suboptimal for strong ones, and further lack fault tolerance, halting training during client or server disruption (Asif et al., 5 Jan 2026).

2. Weight-Sharing Super-Network Architecture and Resource-Aware Subnetwork Generation

SuperSFL centralizes the model as a "super-network," Θ={θ1,θ2,,θL}\Theta = \{\theta_1, \theta_2, \ldots, \theta_L\}, where LL is the full model depth. Each client ii is dynamically allocated a structurally compatible subnetwork defined by a contiguous prefix of layers θi={θ1,,θdi}\theta_i = \{\theta_1, \ldots, \theta_{d_i}\}, where the depth did_i is determined by its resource profile.

At initialization, each client reports memory mim_i and communication latency latilat_i. The server computes subnetwork depth via

di=min(αmi+βlatmaxlatilatmaxlatmin+ϵ,L1),di1d_i = \min \left( \left\lfloor \alpha m_i \right\rfloor + \left\lfloor \beta \frac{lat_{max}-lat_i}{lat_{max}-lat_{min} + \epsilon} \right\rfloor,\, L-1 \right),\quad d_i \ge 1

with α=0.5\alpha = 0.5 layers/GB, β=4\beta = 4, and ϵ=106\epsilon = 10^{-6}. This direct mapping produces resource-aware splits that optimize per-client throughput. Each client receives model slice θi\theta_i and partakes in training with minimal redundancy or underutilization (Asif et al., 5 Jan 2026).

3. Three-Phase Gradient Fusion (TPGF): Optimization under Heterogeneous Splits

TPGF is introduced to mediate the update of client models informed by distinct pathway losses and supervised signals across the client–server boundary:

  1. Phase 1: Local Supervision
    • Client forward-passes zic=fθi(xi)z_i^c = f_{\theta_i}(x_i).
    • Local classifier hϕih_{\phi_i} predicts y^i\hat{y}_i and computes loss LclientL_{client}.
    • Client updates classifier parameters and computes/clip the encoder gradient gclientg_{client}.
  2. Phase 2: Server Supervision
    • Client sends zicz_i^c to the server.
    • Server computes LserverL_{server}, updates its parameters, and returns gradient gzg_z.
    • Client back-propagates gzg_z to acquire gserverg_{server}.
  3. Phase 3: Loss-Weighted Gradient Fusion

    • Weights are assigned based on both subnetwork depth and loss:

    wclient=didi+ds(Lclient+ϵ)1(Lclient+ϵ)1+(Lserver+ϵ)1w_{client} = \frac{d_i}{d_i + d_s} \cdot \frac{(\mathcal{L}_{client}+\epsilon)^{-1}}{(\mathcal{L}_{client}+\epsilon)^{-1} + (\mathcal{L}_{server}+\epsilon)^{-1}}

  • Fused gradient: θi=wclientgclient+(1wclient)gserver\nabla_{\theta_i} = w_{client} g_{client} + (1-w_{client}) g_{server}
  • Parameters θi\theta_i updated via fused gradient.

Each iteration executes these phases, permitting fully parallel client operation and preserving communication efficiency. No added protocol overhead is introduced beyond smashed data and its gradient (Asif et al., 5 Jan 2026).

4. Fault Tolerance and Collaborative Aggregation

To maintain progress during network outages or server unavailability, each client possesses a lightweight local classifier hϕih_{\phi_i}. If the server does not respond to zicz_i^c within 5 seconds, the client enters a fallback mode and continues with local-only updates as per Phase 1. Upon server re-availability, the client resynchronizes with the global state.

Model aggregation leverages a composite weighting over client depth and performance:

wi=dijdj(Lclienti+ϵ)1j(Lclientj+ϵ)1w_i = \frac{d_i}{\sum_j d_j} \cdot \frac{(\mathcal{L}_{client}^i+\epsilon)^{-1}}{\sum_j (\mathcal{L}_{client}^j+\epsilon)^{-1}}

Layer-wise averaging with server consistency is conducted via the convex objective:

θˉ=iwiθi+λθsiwi+λ,λ=0.01\bar\theta^\ell = \frac{\sum_i w_i \theta_i^\ell + \lambda \theta_s^\ell}{\sum_i w_i + \lambda}, \quad \lambda=0.01

Classifier parameters ϕi\phi_i remain local, never subjected to global averaging (Asif et al., 5 Jan 2026).

5. Empirical Evaluation: Convergence, Efficiency, and Robustness

Experiments utilize a Vision Transformer (ViT-16) backbone on CIFAR-10 and CIFAR-100, under heterogeneous client memory ([2,16][2,16] GB) and latency ([20,200][20,200] ms) distributions, with non-IID Dirichlet partitioning. Key results are summarized:

Dataset #Clients Target Acc. Rounds (SFL/DFL/SSFL) Comm. (MB) (SFL/DFL/SSFL) Time (s) (SFL/DFL/SSFL)
CIFAR-10 50 70% 11 / 9 / 5 9075 / 2305 / 466 6127 / 2650 / 595
CIFAR-10 100 75% 19 / 16 / 12 21463 / 15472 / 939 12168 / 14368 / 1010
CIFAR-100 50 75% 35 / 27 / 15 28938 / 7909 / 7194 21284 / 9796 / 8766
CIFAR-100 100 80% 100 / 34 / 22 165358 / 13638 / 9719 114955 / 15328 / 8926

SuperSFL achieves 2–5× faster convergence, up to 20× reduced communication cost, and up to 13× shorter training time compared to baseline SFL (Asif et al., 5 Jan 2026).

Energy and carbon footprint metrics indicate higher efficiency at superior accuracy:

Dataset Clients Method Acc.(%) Avg. Power (W) Power/Acc (W/%) CO₂ (g)
CIFAR-10 50 SFL 78.84 1165 14.78 466.19
DFL 70.15 362 5.17 144.88
SSFL 96.93 493 5.09 197.17
CIFAR-100 100 SSFL 87.48 1539 17.60 615.52

Ablation studies demonstrate that full TPGF is necessary for optimal convergence (96.93% accuracy vs. 85.89% with equal fusion). Server-availability ablations show that SuperSFL retains >89% accuracy with only 10% server-supervised rounds and converges (86.36%) even with no server involvement (Asif et al., 5 Jan 2026).

SuperSFL extends previous SplitFed and federated optimization methods by embedding explicit architectural support for resource heterogeneity and communication disruption. The weight-sharing super-network and TPGF mechanisms generalize model personalization while preserving strict structural compatibility across clients. The composite aggregation scheme integrates both structure- and performance-awareness, which is absent in classical FedAvg and SplitFed paradigms.

A plausible implication is that SuperSFL reduces barriers to real-world federated deployments in IoT and mobile environments, particularly for computer vision tasks on resource-diverse clients. Its fallback mode and performance-weighted aggregation offer robustness against network volatility and client-side interruptions, supporting prolonged, stable training under non-ideal real-world conditions (Asif et al., 5 Jan 2026).

7. Summary and Outlook

SuperSFL provides a principled approach to federated split learning that systematically addresses client heterogeneity, network unreliability, and efficiency constraints. Its adoption of a weight-sharing super-network with resource-dependent subnetwork allocation, coupled with Three-Phase Gradient Fusion and client-server collaborative aggregation, yields state-of-the-art performance and robustness benchmarks. These advances position SuperSFL as a scalable and practical foundation for federated learning applications in highly variable edge computing ecosystems (Asif et al., 5 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SuperSFL.