Papers
Topics
Authors
Recent
Search
2000 character limit reached

Federated Split Learning: Concepts and Advances

Updated 18 March 2026
  • Federated Split Learning is a distributed machine learning paradigm that divides a global model into client-side and server-side subnetworks to boost privacy and efficiency.
  • It unifies federated and split learning methodologies by using a predefined cut layer, enabling dynamic resource allocation and robust differential privacy across edge devices.
  • Empirical studies demonstrate that FSL can reduce communication load by up to 47% and significantly improve convergence rates in diverse real-world applications.

Federated Split Learning (FSL) is a distributed machine learning paradigm that unifies the client-parallelism and weight aggregation of Federated Learning (FL) with the privacy-preserving, client-side computation offloading of Split Learning (SL). FSL partitions deep models at a predefined “cut layer” into a client-side subnetwork and a server-side subnetwork, enabling collaborative training over edge networks without sharing raw data. It has been shown to improve round-wise communication efficiency, device scalability, and convergence rates in settings ranging from human activity recognition to large-scale image and time-series tasks. Recent advances further integrate rigorous differential privacy mechanisms, dynamic resource-aware partitioning, auxiliary local losses, model compression, and blockchain-based orchestration.

1. Federated Split Learning Principles and Workflow

In the canonical FSL architecture, the global neural network is separated at a “cut layer” into two functional blocks:

  • Client-side subnetwork: Parameters θc\theta_c of dimension uu, executed on each edge device holding local data Dn={(xn,k,yn,k)}D_n = \{(x_{n,k}, y_{n,k})\}.
  • Server-side subnetwork: Parameters θs\theta_s of dimension rr, maintained on the central server.

The protocol operates in synchronized rounds:

  • Each client samples a mini-batch BnB_n and computes intermediate activations Sn=fc(Xn;θc,n)S_n = f_c(X_n; \theta_{c,n}) up to the split point.
  • Gaussian noise may be applied for local differential privacy, S~n=Sn+N(0,σn2I)S̃_n = S_n + \mathcal{N}(0, \sigma_n^2 I).
  • The server collects S~nS̃_n and yny_n from all clients, concatenates the activations, and completes the forward pass via fs(S;θs)f_s(S; \theta_s).
  • Backpropagation is split: the server updates θs\theta_s with server-side gradients and computes cut-layer gradients distributed back to each client to update θc,n\theta_{c,n}.
  • Federated aggregation (e.g., FedAvg) is applied to client-side weights, which are then broadcast to all clients for the next round.

This split-and-aggregate approach generalizes to settings with group-based sequential splits (Zhang et al., 2023), non-IID/heterogeneous clients (Asif et al., 5 Jan 2026), auxiliary local heads (Mu et al., 2023), or hierarchical device-edge-cloud deployments (Ni et al., 7 Oct 2025).

2. Privacy Mechanisms and Attack Surfaces

FSL enhances data privacy by ensuring that raw samples never leave the device, but cut-layer activations—so-called “smashed data”—could still leak information. To quantify and mitigate privacy risk:

  • Differential Privacy (DP): Local Rényi DP is achieved by adding calibrated Gaussian noise to client activations, guaranteeing that for any two neighboring datasets X,XX,X', Dα(G(X)G(X))ϵnD_\alpha(\mathcal{G}(X) \,\|\,\mathcal{G}(X'))\leq \epsilon_n per-round (Ndeko et al., 2024).
  • Noise Scaling: The standard deviation σn\sigma_n is set relative to the 2\ell_2-sensitivity HH of activations and target privacy budget ϵn\epsilon_n, with full privacy budget accounting over TT rounds via moment accountants or Rényi DP advanced composition techniques.
  • Adversarial Reconstruction: Attack resilience is typically measured by the reconstructability of xx from zz via autoencoders or structural similarity (SSIM); this risk falls as the split layer is placed deeper, but at increased client energy cost (Lee et al., 2023).

Notably, FSL can outperform FL in privacy-utility trade-offs: for a fixed DP budget ϵ\epsilon, FSL can yield 30–40% absolute improvements in model accuracy compared to federated learning with matching noise levels (Ndeko et al., 2024).

3. Communication and Computation Efficiency

FSL achieves considerable communication savings by restricting round-wise client uploads to low-dimensional smashed activations (O(bq)O(bq)) rather than full model weights (O(u+r)O(u+r)). Empirical results for LSTM-HAR models on UCI HAR with batch size b=32b=32 and q100q\approx 100 features demonstrated:

  • FL round time: 123s123\,\mathrm{s}
  • FSL round time: 65s65\,\mathrm{s} implying approximately 2×2\times speedup and 47%\sim 47\% reduction in communication load at scale.

Group-based FSL variants partition clients into parallel groups, each performing intra-group split learning with local aggregation, further accelerating convergence in resource-limited wireless environments (Zhang et al., 2023). Newer approaches incorporate model compression (structured/unstructured pruning), gradient quantization (e.g., $8$-bit stochastic quantizers), and activation dropout, jointly reducing bandwidth and client computation costs with theoretically bounded impact on convergence and generalization (Zhang et al., 2024, Ni et al., 7 Oct 2025).

4. Robustness to Heterogeneity and Data Skew

A key motivation for split architectures is to accommodate device heterogeneity and non-IID label distributions. Advances include:

  • Label-Skew Correction: SCALA concatenates all client activations server-side and applies logit-adjusted cross-entropy, balancing class updates even under extreme distribution skew (e.g., each client observes only $2$ of $10$ classes), resulting in $8$–$20$ percentage-point accuracy gains over prior FL/SFL baselines (Yang et al., 2024).
  • Resource-Aware Partitioning: Clients choose individual cut layers according to memory, CPU, or link bandwidth constraints; optimal per-device split selection and wireless bandwidth allocation minimizes system-wide latency and maximizes overall training efficiency (Xu et al., 2023, Asif et al., 5 Jan 2026).
  • Personalized and Fair Learning: Multi-block splits and supplementary local heads enable transfer learning and personalized updates, ensuring high accuracy even for “thin” clients, with fairness guarantees on computation/workload allocation (Wadhwa et al., 2023, Yuan et al., 14 Aug 2025).

Token-fusion strategies across multimodal robots and collaborative robots in factories further exploit split points and resource-aware aggregation for robust, scalable, and low-latency training in industrial IoT (Ni et al., 7 Oct 2025).

5. Extensions: Auxiliary Losses, Decentralization, and Beyond

Communication/storage efficient FSL variants employ auxiliary networks (“heads”) at client cut layers to approximate the server loss, allowing for less frequent activation uploads (e.g., every hh mini-batches), with a single server model to eliminate O(N)O(N) server memory growth (Mu et al., 2023, Mu et al., 21 Jul 2025). Formal convergence guarantees hold under mild assumptions (e.g., O(1/T)O(1/\sqrt{T}) for non-convex loss).

Super-network strategies (SuperSFL) sample client-specific subnetworks from a global weight-sharing backbone, dynamically fit to device-specific memory/latency profiles, fusing local and server gradients with depth- and loss-weighted aggregation, with up to 20×20\times communication reduction and $2$–5×5\times round acceleration over baseline SFL (Asif et al., 5 Jan 2026).

Decentralized FSL on permissioned blockchains orchestrates split-training and FedAvg via transient fields and private data collections, fully removing central coordinators while maintaining near-centralized accuracy and scalable throughput (e.g., 94.1%94.1\% on CIFAR-10, 50×50\times clients) (Penedo et al., 10 Jul 2025).

FSL has also been successfully adapted for distributed sequential (RNN) learning over partitioned data, multimodal fusion, and privacy attacks/defenses (gradient inversion mitigation via zeroth-order optimization (Shi et al., 2024), local/cut-layer DP, and PixelDP techniques).

6. Experimental Performance and Practical Guidance

Empirical studies across real-world domains and simulated networks report:

  • UCI HAR FSL: 92%92\% peak accuracy (with DP 81%81\% at ϵ=80\epsilon=80), 7%7\% higher than FL, and 50%50\% round time reduction (Ndeko et al., 2024).
  • SCALA: Robust to extreme label skew (α=2\alpha=2 classes/client), e.g., 81.3%81.3\% (CIFAR-10), 60.7%60.7\% (CINIC-10), 43.1%43.1\% (CIFAR-100), outperforming FL/FedAvg/FedProx/Dyn baselines by up to $20$ points (Yang et al., 2024).
  • GSFL: 31.5%31.5\% end-to-end latency reduction at matched accuracy versus vanilla FL (Zhang et al., 2023).
  • Storage/comms scaling: $5$–10×10\times bandwidth and $50$–80%80\% server storage reduction (CSE-FSL, h=5–25) with $1$–2%2\% accuracy degradation (Mu et al., 2023, Mu et al., 21 Jul 2025).
  • Decentralized FSL: Near-parity accuracy to centralized FSL, compressed epoch times (e.g., $30$ min vs $85$ min Ethereum-SL for CIFAR-10), and scalable ledger/network performance with stable latency up to $25$ clients (Penedo et al., 10 Jul 2025).

Table: Representative Communication Savings (CSE-FSL, CIFAR-10) | Method | Accuracy | Comm (GB) | Server Storage | |---------------|----------|-----------|---------------| | FSL_MC | 80.6% | 172.5 | 5.3M params | | CSE-FSL (h=5) | 76.5% | 18.1 | 1.6M params |

7. Open Directions and Limitations

Emerging challenges and future lines of investigation for FSL include:

FSL represents a convergent paradigm that delivers substantial efficiency, scalability, and privacy gains over both FL and SL, with rich ongoing development spanning architecture, privacy, resource-aware deployment, and theoretical guarantees (Ndeko et al., 2024, Yang et al., 2024, Asif et al., 5 Jan 2026, Penedo et al., 10 Jul 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Split Learning (FSL).