Split Federated Learning: A Distributed ML Paradigm

Updated 29 September 2025

Split Federated Learning (SFL) is a distributed ML paradigm that partitions neural networks between clients and servers, reducing on-device computation while preserving data privacy.
SFL employs dual-stage training with lightweight client-side processing and robust server-side aggregation, supporting variants like SFL-V1 and SFL-V2 for diverse deployment needs.
Integration of differential privacy, gradient perturbation, and adaptive communication strategies in SFL ensures efficient training, strong security, and scalability across heterogeneous devices.

Split Federated Learning (SFL) is a distributed machine learning paradigm that synergistically combines the architectural partitioning of split learning (SL) with the parallel, aggregation-based coordination of federated learning (FL). SFL is designed to enforce improved model/data privacy, support resource-constrained clients, and reduce computation bottlenecks, while providing strong scalability and communication efficiency. SFL has rapidly evolved, with multiple architectural variants and theoretical, empirical, and security analyses contributing to the formalization and optimization of this approach.

1. Core Principles and System Design

At its foundation, SFL partitions a deep neural network at a designated “cut layer” $L_c$ into a client-side submodel $W_C$ and a server-side submodel $W_S$ , so that the client processes raw data only up to $L_c$ and then sends the resulting intermediate output (smashed data) to a server, which completes the forward and backward passes. SFL thereby reduces the client-side computational/energy burden and diminishes local exposure risk of model weights or raw data.

Relative to FL, in which each client downloads, trains, and uploads the full model, SFL’s architectural split allows resource-constrained devices to participate by running only a lightweight subnetwork. The parallelism of FL is preserved via client-side aggregation (e.g., using FedAvg), and further privacy is ensured because neither clients nor server is ever exposed to the entire model.

SFL is typically instantiated in two major variants:

SFL-V1: Each client has a dedicated server-side submodel (per-client instance); after local training and server-side updates, all submodels (client- and server-side) are aggregated (often via FedAvg or similar methods).
SFL-V2: All clients share a single server-side submodel, which processes smashed data sequentially or in mini-batches. Client-side models are aggregated, but the server-side submodel is maintained in a unified manner.

This division underpins SFL’s ability to mitigate the computational bottleneck of FL and the under-utilization of parallelism in classical SL (Thapa et al., 2020, Han et al., 23 Feb 2024).

2. Mathematical Formulation and Privacy Mechanisms

SFL supports integration of rigorous privacy protection through differential privacy (DP) at several locations:

Client-Side Gradient Perturbation: Individual gradients are clipped and additive noise is applied to enforce $(\epsilon, \delta)$ -DP:

$\overline{g_{k,t}}(x_i) = \frac{g_{k,t}(x_i)}{\max\left(1, \frac{\|g_{k,t}(x_i)\|_2}{C'}\right)}, \quad \tilde{g_{k,t}} = \frac{1}{n_k} \sum_i \left(\overline{g_{k,t}}(x_i) + \mathcal{N}(0, \sigma^2 {C'}^2 I)\right)$

Noise on Intermediate Activations ("PixelDP"): Output activations at the cut layer are perturbed via Laplacian noise:

$A^{\text{P}}_{k,i} = A_{k,i} + \operatorname{Lap}\!\left(\frac{\Delta I^A_{p,q}}{\varepsilon'}\right)$

where $\Delta I^A_{p,q}$ is the sensitivity.

Strict privacy can be traded against slower convergence; in the presence of strong DP constraints, learning may be less efficient, but privacy and robustness to adversarial attacks are substantially improved (Thapa et al., 2020). Advanced strategies such as probabilistic masking and layer-wise knowledge compensation also provide structured, gradient-based privacy with theoretically quantified amplification (Wang et al., 18 Sep 2025).

3. Computational and Communication Efficiency

Computation: Clients only run a fraction $\alpha$ of the total network ( $|W_C| = \alpha|W|$ ) so computation and memory requirements at the edge are significantly decreased. Early cut layers lead to lighter client loads, essential for resource-constrained devices.

Communication: SFL improves communication efficiency over FL by transmitting only smashed data (size much smaller than the full parameter set) and synchronizing only the client-side model. In parallelized SFL, communication cost per epoch is largely independent of the client count (due to asynchronous/multi-threaded aggregation), whereas sequential relay versions scale suboptimally (Thapa et al., 2020). More advanced frameworks propose gradient aggregation and broadcasting schemes, combining dynamic model splitting with joint communication/computation resource allocation to further reduce overhead under privacy constraints (Liang et al., 2 Jan 2025).

4. Convergence Analysis and Performance Implications

Rigorous convergence analysis of SFL is difficult due to dual-paced (client/server) updates, but has been successfully established for both SFL-V1 and SFL-V2:

Strongly Convex Objectives: SFL achieves an $O(1/T)$ convergence rate analogous to FL; importantly, in SFL-V1, this rate is provably invariant under cut layer selection (Han et al., 23 Feb 2024, Dachille et al., 20 Dec 2024).
General Convex Objectives: Rate degrades to $O(1/\sqrt[3]{T})$ , matching the theoretical expectations under heterogeneity.
Non-Convex Losses and Partial Participation: SFL remains robust, with convergence speed and final accuracy outperforming FL and SL especially as client heterogeneity increases and participation probability decreases.

Empirical studies show that:

SFL-V1’s performance is stable across any cut layer (due to per-client server models and aggregation structure).
SFL-V2’s accuracy is highly sensitive to the cut layer: early (shallow) splits lead to higher accuracy, particularly in non-IID data settings or large-scale federations (Dachille et al., 20 Dec 2024).
Optimally tuned SFL-V2 can outperform classical FedAvg by large margins, especially on heterogeneous datasets (e.g., up to ca. 10%–15% absolute test accuracy improvement on CIFAR100/ResNet-18) (Dachille et al., 20 Dec 2024).
MiniBatch SFL variants, in which server-side updates are performed via mini-batch SGD over all client outputs, further mitigate client drift due to non-IID data (Huang et al., 2023).

5. Privacy, Security, and Attack Vectors

SFL hardens server-side intellectual property compared to FL by never exposing server-side model weights to clients. It also blocks black-box prediction APIs, mitigating standard model extraction (ME) attacks. However, it is not immune to white-box client-side adversaries:

Gradient Query Model Extraction: Malicious clients can exploit server-provided gradient information to reconstruct high-fidelity surrogates of the server-side submodel, using techniques such as Craft-ME, GAN-ME, Gradient Matching (GM-ME), and (Soft)Train-ME; with limited access to training data, surrogate fidelities above 90% can be achieved, revealing a tension between privacy guarantees and recoverability (Li et al., 2023).
Cut Layer Trade-off: A deeper cut layer (more client-side depth) enhances privacy by introducing nonlinearity, reducing reconstructibility (measured e.g. by SSIM between original and reconstructed input), and lowering model extraction success rates. However, this comes at the cost of increased computation and potential exposure of client input information (Lee et al., 10 Dec 2024, Lee et al., 2023).

Defenses include:

L1 regularization on client submodels to reduce reconstructible information,
gradient perturbation/noise-injection,
structured stochastic masking to provide differential privacy guarantees without explicit noise (Wang et al., 18 Sep 2025).

6. Adaptive, Parallel, and Heterogeneity-Aware SFL Variants

Recent research extends SFL to cope with non-IID data, system heterogeneity, and communication/deadline constraints:

Adaptive Model Splitting and Batch Size Control: Adaptive selection of split points and mini-batch sizes minimizes straggler effects and training latency in edge networks, with convergence guarantees and real-time optimization via reinforcement learning and block-coordinate descent (Lin et al., 10 Jun 2025, Lin et al., 19 Mar 2024).
Clustered and Parallel Aggregation Frameworks: By organizing clients into clusters based on computational and statistical similarity, and assigning adaptive update frequencies, ParallelSFL reduces traffic, improves speed, and enhances accuracy under large-scale heterogeneity (Liao et al., 2 Oct 2024).
Collaborative and Hierarchical SFL: Multi-tier organizational structures enable weak clients to offload computation to strong aggregators, decoupling client/server synchronization and cutting delays, and facilitating application in mobile, IoT, and O-RAN-core scenarios (Papageorgiou et al., 22 Apr 2025, Gu et al., 4 Aug 2025, Pervej et al., 9 Nov 2024).
Semi-Supervised and Personalization Extensions: Algorithms such as SemiSFL exploit clustering regularization and pseudo-labeling to handle unlabeled and non-IID data (Xu et al., 2023), while PHSFL isolates classifier fine-tuning for client personalization after global body training, boosting client utility in real wireless deployments (Pervej et al., 9 Nov 2024).

7. Open Challenges and Research Directions

Critical challenges for SFL’s practical deployment remain:

Dynamic Cut Layer and Resource Optimization: Jointly optimizing split location, communication bandwidth, batch size, power constraints, client scheduling, and aggregation intervals in dynamic heterogeneous environments (Xu et al., 2023, Lee et al., 10 Dec 2024).
Resilience to Adversarial and Inference Attacks: Extending privacy mechanisms to membership/label inference and adversarial reconstruction within the bounds of resource and communication constraints (Lee et al., 2023, Wang et al., 18 Sep 2025).
Lightweight and Quantized Architectures: Exploring quantization, downsampling, and model pruning to further reduce footprint on edge devices while maintaining accuracy and privacy (Lee et al., 2023).
Scaling and Convergence Under Extreme Conditions: Formalizing generalization bounds for non-convex settings, high device/client loss, intermittent availability, and bespoke aggregation/topologies.
Interplay of Privacy, Utility, and Incentives: Mathematical characterization of Stackelberg equilibria in incentive-centric SFL, balancing utility, cost, and privacy for both model owners and resource-contributing clients (Lee et al., 10 Dec 2024).
Industrial Deployments and O-RAN Applications: Mutual learning and inverse modeling strategies (e.g., SplitMe) significantly reduce communication under strict control deadlines, indicating practical potential for emerging 6G and networked edge scenarios (Gu et al., 4 Aug 2025).

Table: SFL System Components and Variants

Component	SFL-V1	SFL-V2	Advanced/Hybrid Variants
Server-side Model	Per-client submodel	Shared submodel	Clustering, aggregator tiers, mutual/inverse (SplitMe)
Cut Layer Sensitivity	Invariant to cut layer	Performance highly sensitive	Adaptive/heterogeneity-aware
Aggregation Mechanism	FedAvg on client+server submodels	FedAvg on client submodels only	Parallel, hierarchical, dynamic
Privacy Techniques	DP, PixelDP, masking	DP, structured randomness	Stackelberg, probabilistic masking, L1/L2 regularization

Conclusion

Split Federated Learning stands as a flexible and scalable framework that unifies distributed training for resource-constrained, privacy-sensitive, and heterogeneous networks. Its continued evolution—encompassing convergence theory, privacy analysis, adaptive/heterogeneity-aware algorithms, and large-scale industry deployments—positions SFL as a focal point for research at the intersection of distributed optimization, privacy engineering, and practical AI systems design (Thapa et al., 2020, Li et al., 2023, Huang et al., 2023, Han et al., 23 Feb 2024, Dachille et al., 20 Dec 2024, Liang et al., 2 Jan 2025, Wang et al., 18 Sep 2025).