SuperSFL: Heterogeneous Federated Split Learning
- SuperSFL is a federated split learning framework that uses weight-sharing super-networks and resource-aware subnetwork generation to handle device heterogeneity.
- It introduces a Three-Phase Gradient Fusion mechanism that optimizes model updates by combining client and server contributions under varying network conditions.
- Fault-tolerant aggregation and performance-weighted updates ensure robust training and efficiency in dynamic, resource-diverse edge environments.
SuperSFL refers to “Resource-Heterogeneous Federated Split Learning with Weight-Sharing Super-Networks,” a framework that addresses the challenge of efficient, robust training in federated edge environments composed of clients with highly diverse computational and network capabilities. Building on existing SplitFed Learning paradigms, SuperSFL introduces a weight-sharing super-network architecture, a Three-Phase Gradient Fusion (TPGF) optimization mechanism, and fault-tolerant aggregation for resilient, high-throughput distributed learning (Asif et al., 5 Jan 2026).
1. Background: SplitFed Learning and Device Heterogeneity
SplitFed Learning (SFL) integrates the architectural principles of Federated Learning (FL) and Split Learning (SL). In SFL, the global model is partitioned at a designated "cut" point such that each client operates a shallow encoder (hosting the initial model layers), while a centralized server hosts the deeper layers (decoder). Clients execute forward passes up to the cut point, transmit "smashed data" activations to the server, which completes the forward and backward computation, and returns the cut-layer gradients. This structure reduces per-client computation and bandwidth demands compared to FL and SL individually.
A major limitation in practical deployments is device heterogeneity, where client resources (CPU/GPU, memory, and network latency) vary significantly. Prior SFL approaches assume a uniform client split depth, which is either computationally excessive for weak devices or suboptimal for strong ones, and further lack fault tolerance, halting training during client or server disruption (Asif et al., 5 Jan 2026).
2. Weight-Sharing Super-Network Architecture and Resource-Aware Subnetwork Generation
SuperSFL centralizes the model as a "super-network," , where is the full model depth. Each client is dynamically allocated a structurally compatible subnetwork defined by a contiguous prefix of layers , where the depth is determined by its resource profile.
At initialization, each client reports memory and communication latency . The server computes subnetwork depth via
with layers/GB, , and . This direct mapping produces resource-aware splits that optimize per-client throughput. Each client receives model slice and partakes in training with minimal redundancy or underutilization (Asif et al., 5 Jan 2026).
3. Three-Phase Gradient Fusion (TPGF): Optimization under Heterogeneous Splits
TPGF is introduced to mediate the update of client models informed by distinct pathway losses and supervised signals across the client–server boundary:
- Phase 1: Local Supervision
- Client forward-passes .
- Local classifier predicts and computes loss .
- Client updates classifier parameters and computes/clip the encoder gradient .
- Phase 2: Server Supervision
- Client sends to the server.
- Server computes , updates its parameters, and returns gradient .
- Client back-propagates to acquire .
- Phase 3: Loss-Weighted Gradient Fusion
- Weights are assigned based on both subnetwork depth and loss:
- Fused gradient:
- Parameters updated via fused gradient.
Each iteration executes these phases, permitting fully parallel client operation and preserving communication efficiency. No added protocol overhead is introduced beyond smashed data and its gradient (Asif et al., 5 Jan 2026).
4. Fault Tolerance and Collaborative Aggregation
To maintain progress during network outages or server unavailability, each client possesses a lightweight local classifier . If the server does not respond to within 5 seconds, the client enters a fallback mode and continues with local-only updates as per Phase 1. Upon server re-availability, the client resynchronizes with the global state.
Model aggregation leverages a composite weighting over client depth and performance:
Layer-wise averaging with server consistency is conducted via the convex objective:
Classifier parameters remain local, never subjected to global averaging (Asif et al., 5 Jan 2026).
5. Empirical Evaluation: Convergence, Efficiency, and Robustness
Experiments utilize a Vision Transformer (ViT-16) backbone on CIFAR-10 and CIFAR-100, under heterogeneous client memory ( GB) and latency ( ms) distributions, with non-IID Dirichlet partitioning. Key results are summarized:
| Dataset | #Clients | Target Acc. | Rounds (SFL/DFL/SSFL) | Comm. (MB) (SFL/DFL/SSFL) | Time (s) (SFL/DFL/SSFL) |
|---|---|---|---|---|---|
| CIFAR-10 | 50 | 70% | 11 / 9 / 5 | 9075 / 2305 / 466 | 6127 / 2650 / 595 |
| CIFAR-10 | 100 | 75% | 19 / 16 / 12 | 21463 / 15472 / 939 | 12168 / 14368 / 1010 |
| CIFAR-100 | 50 | 75% | 35 / 27 / 15 | 28938 / 7909 / 7194 | 21284 / 9796 / 8766 |
| CIFAR-100 | 100 | 80% | 100 / 34 / 22 | 165358 / 13638 / 9719 | 114955 / 15328 / 8926 |
SuperSFL achieves 2–5× faster convergence, up to 20× reduced communication cost, and up to 13× shorter training time compared to baseline SFL (Asif et al., 5 Jan 2026).
Energy and carbon footprint metrics indicate higher efficiency at superior accuracy:
| Dataset | Clients | Method | Acc.(%) | Avg. Power (W) | Power/Acc (W/%) | CO₂ (g) |
|---|---|---|---|---|---|---|
| CIFAR-10 | 50 | SFL | 78.84 | 1165 | 14.78 | 466.19 |
| DFL | 70.15 | 362 | 5.17 | 144.88 | ||
| SSFL | 96.93 | 493 | 5.09 | 197.17 | ||
| CIFAR-100 | 100 | SSFL | 87.48 | 1539 | 17.60 | 615.52 |
Ablation studies demonstrate that full TPGF is necessary for optimal convergence (96.93% accuracy vs. 85.89% with equal fusion). Server-availability ablations show that SuperSFL retains >89% accuracy with only 10% server-supervised rounds and converges (86.36%) even with no server involvement (Asif et al., 5 Jan 2026).
6. Related Methodologies and Significance
SuperSFL extends previous SplitFed and federated optimization methods by embedding explicit architectural support for resource heterogeneity and communication disruption. The weight-sharing super-network and TPGF mechanisms generalize model personalization while preserving strict structural compatibility across clients. The composite aggregation scheme integrates both structure- and performance-awareness, which is absent in classical FedAvg and SplitFed paradigms.
A plausible implication is that SuperSFL reduces barriers to real-world federated deployments in IoT and mobile environments, particularly for computer vision tasks on resource-diverse clients. Its fallback mode and performance-weighted aggregation offer robustness against network volatility and client-side interruptions, supporting prolonged, stable training under non-ideal real-world conditions (Asif et al., 5 Jan 2026).
7. Summary and Outlook
SuperSFL provides a principled approach to federated split learning that systematically addresses client heterogeneity, network unreliability, and efficiency constraints. Its adoption of a weight-sharing super-network with resource-dependent subnetwork allocation, coupled with Three-Phase Gradient Fusion and client-server collaborative aggregation, yields state-of-the-art performance and robustness benchmarks. These advances position SuperSFL as a scalable and practical foundation for federated learning applications in highly variable edge computing ecosystems (Asif et al., 5 Jan 2026).