SFL Optimization Framework

Updated 29 September 2025

The paper introduces a generalized SFL (SFLG) architecture that blends sequential and parallel aggregation to optimize convergence speed and accuracy compared to traditional FL and SL.
It employs pooling layer tuning to reduce communication overhead by up to 4× while maintaining model performance and data privacy in client devices.
Empirical results on Raspberry Pi demonstrate the framework’s ability to minimize on-device computational load and adapt to non-IID data in large-scale IoT settings.

Split Federated Learning (SFL) is an emerging distributed training paradigm that partitions a deep network between resource-constrained clients and a server, supporting collaborative model training while preserving data privacy and reducing on-device computational load. An optimization framework for SFL encompasses mathematical modeling and algorithmic strategies to select model partition points, allocate system resources, and orchestrate training procedures so as to optimize metrics including convergence speed, training time, communication overhead, and energy consumption under resource constraints.

1. Comparative Evaluation of SFL, FL, and SL in IoT Scenarios

The optimization of SFL is motivated by comparative analysis with classic Federated Learning (FL) and Split Learning (SL) under resource-constrained Internet of Things (IoT) settings. Key empirical findings are:

Learning Performance: SL converges faster than FL under imbalanced data but fails under extreme non-IID conditions, exhibiting "prediction collapse" to a single class. FL maintains robustness under highly non-IID data due to full-model local updates. SFL, especially in its generalized form (SFLG), attains intermediate or superior accuracy, blending FL's parallel aggregation and SL’s lightweight client computation.
On-device Overhead: FL incurs high peaks in client computation, memory, and power due to complete local training. SL and SFL require only the client-side subnetwork execution, drastically reducing these overheads.
Training Time and Scalability: FL scales well with increasing clients due to parallelism; SL is sequential and suffers from high per-round latency as client count increases. SFLG introduces parallelism on the client side and flexible grouping/aggregation on the server side, enabling scalable operation for large-scale IoT deployments.

2. Empirical Measurements: Overhead and Trade-offs

Experiments on Raspberry Pi platforms reveal critical metrics:

Scheme	Training Time	Communication Overhead	Client Power & Memory
FL	Decreases w/ #clients (parallel update)	MB-order (model params only)	High
SL	Increases w/ #clients (sequential)	GB-order ("smashed" data intermediates)	Low
SFLG	Parallel/grouped, scalable	Tunable, MB-GB depending on cut-point	Low

Communication-Burden Bottleneck: SL and SFLG can incur large volume transfers due to intermittent-layer outputs ("smashed data"). The communication amount is directly determined by the cut layer and the client-side network design.
Device Thermal and Power Consumption: SFL and SL's shallow client subnetworks keep Raspberry Pi temperatures and memory usage below that of FL, enabling longer battery life and practical deployment.

3. Optimization Techniques in SFL Frameworks

Two core optimizations are proposed:

A. Generalized SplitFed Learning (SFLG) Architecture

Server-side Hybrid Aggregation: SFLG admits a flexible grouping, where the server can control the trade-off between sequential (SFLV2, SL-like) and parallel (SFLV1, FL-like) aggregation via group assignments. The server aggregates subgroup weights using a weighted average:

$w_{t+1} = \sum_{g \in G} \frac{n_g}{n} w_t^g$

where $n_g$ is the sample count in group $g$ and $n$ the overall total.

Scalability Tuning: By adjusting the number of groups (and correspondingly the server-side subnetwork copies), one can balance system scalability and learning stability; fewer groups scale to more clients, while more closely resembles standard FL.

B. Communication Overhead Reduction via Architectural Tuning

Pooling Layer Tuning: Downsampling the intermediate activation at the client using pooling layers before the cut point reduces the volume of "smashed data." The post-pooling output size is given by

$A_{out} = \frac{A_{in} - f}{s} + 1$

where $A_{in}$ is input size, $f$ the kernel, $s$ the stride. The ratio $A_{out}/A_{in}$ ("factor") directly quantifies communication cost reduction. Empirically, communication can be cut by up to 4× with negligible or even positive impact on final accuracy if the server-side network is adapted accordingly.

4. Mathematical Formulation and Resource-Accuracy Tradeoffs

The framework formalizes model aggregation and communication-volume reduction mathematically:

Purpose	Formula
Pooling output size	$A_{out} = \frac{A_{in} - f}{s} + 1$
Communication reduction	$\text{factor} = \frac{A_{out}}{A_{in}}$
SFLG aggregation	$w_{t+1} = \sum_{g \in G} \frac{n_g}{n} w_t^g$

Architectural Tuning: Lower factor yields lower data transfer; achieved by increasing pooling size or stride.
Aggregation Rule: Generalizes both full parallel and full sequential SFL, parameterized by server resource allocation (number of copies, group formation).

5. Real-World Implications for IoT and Edge Systems

Optimization yields the following practical gains:

Scalability: SFLG can operate with thousands to millions of clients by flexibly adjusting server resource allocation, avoiding a combinatorial explosion in parameter copies.
Device Practicality: Lightweight client subnetworks preserve feasible power/thermal budgets, giving longevity to battery-operated IoT devices.
Bandwidth Adaptability: By tuning pooling layers/programatically narrowing activations, SFL and SFLG can function robustly under constrained or variable network bandwidth, key for remote or rural IoT scenarios.
Performance Consistency: SFLG achieves balanced/competitive accuracy under both imbalanced and highly non-IID client data settings, blending the strengths of FL and SL. Thus, system designers may tune for communication, compute, and data non-uniformity as deployment parameters dictate.
Forward Compatibility: The same principles enable SFLG’s adoption in evolving networking environments (e.g., 5G/6G), leveraging server resources as throughput increases.

6. Implementation and Engineering Considerations

Resource Requirements: SFLG requires minimal client compute (first few layers), but server resource burden is tunable via group strategy.
Deployment Strategy: System integrators can favor sequential, parallel, or grouped server operation depending on available cluster resources and application-specific convergence/accuracy constraints.
Potential Limitations: SFLG may require careful server memory management if too many parallel server-side subnets are instantiated. Pooling strategies must be tailored to avoid loss of semantically essential features in the client-side activations.

7. Summary

The optimization framework for SFL, as specified in the referenced study, centers on a generalized SFLG architecture that merges sequential and parallel processing at the server, allowing fine-grained trade-off tuning among scalability, accuracy, and resource usage. Communication bottlenecks are mitigated by explicit architectural tuning of client-side pooling layers, empirically showing order-of-magnitude reductions in overhead. Empirical results on physical IoT devices (Raspberry Pi) confirm that SFLG achieves learning efficacy and resource efficiency in both balanced and highly heterogeneous data distributions. This positions the framework as a key enabler for large-scale, real-world, resource-sensitive distributed ML in future edge and IoT deployments (Gao et al., 2021).

PDF Markdown Chat (Pro)

References (1)

Evaluation and Optimization of Distributed Machine Learning Techniques for Internet of Things (2021)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Optimization Framework for SFL.