SFL Optimization Framework
- The paper introduces a generalized SFL (SFLG) architecture that blends sequential and parallel aggregation to optimize convergence speed and accuracy compared to traditional FL and SL.
- It employs pooling layer tuning to reduce communication overhead by up to 4× while maintaining model performance and data privacy in client devices.
- Empirical results on Raspberry Pi demonstrate the framework’s ability to minimize on-device computational load and adapt to non-IID data in large-scale IoT settings.
Split Federated Learning (SFL) is an emerging distributed training paradigm that partitions a deep network between resource-constrained clients and a server, supporting collaborative model training while preserving data privacy and reducing on-device computational load. An optimization framework for SFL encompasses mathematical modeling and algorithmic strategies to select model partition points, allocate system resources, and orchestrate training procedures so as to optimize metrics including convergence speed, training time, communication overhead, and energy consumption under resource constraints.
1. Comparative Evaluation of SFL, FL, and SL in IoT Scenarios
The optimization of SFL is motivated by comparative analysis with classic Federated Learning (FL) and Split Learning (SL) under resource-constrained Internet of Things (IoT) settings. Key empirical findings are:
- Learning Performance: SL converges faster than FL under imbalanced data but fails under extreme non-IID conditions, exhibiting "prediction collapse" to a single class. FL maintains robustness under highly non-IID data due to full-model local updates. SFL, especially in its generalized form (SFLG), attains intermediate or superior accuracy, blending FL's parallel aggregation and SL’s lightweight client computation.
- On-device Overhead: FL incurs high peaks in client computation, memory, and power due to complete local training. SL and SFL require only the client-side subnetwork execution, drastically reducing these overheads.
- Training Time and Scalability: FL scales well with increasing clients due to parallelism; SL is sequential and suffers from high per-round latency as client count increases. SFLG introduces parallelism on the client side and flexible grouping/aggregation on the server side, enabling scalable operation for large-scale IoT deployments.
2. Empirical Measurements: Overhead and Trade-offs
Experiments on Raspberry Pi platforms reveal critical metrics:
| Scheme | Training Time | Communication Overhead | Client Power & Memory |
|---|---|---|---|
| FL | Decreases w/ #clients (parallel update) | MB-order (model params only) | High |
| SL | Increases w/ #clients (sequential) | GB-order ("smashed" data intermediates) | Low |
| SFLG | Parallel/grouped, scalable | Tunable, MB-GB depending on cut-point | Low |
- Communication-Burden Bottleneck: SL and SFLG can incur large volume transfers due to intermittent-layer outputs ("smashed data"). The communication amount is directly determined by the cut layer and the client-side network design.
- Device Thermal and Power Consumption: SFL and SL's shallow client subnetworks keep Raspberry Pi temperatures and memory usage below that of FL, enabling longer battery life and practical deployment.
3. Optimization Techniques in SFL Frameworks
Two core optimizations are proposed:
A. Generalized SplitFed Learning (SFLG) Architecture
- Server-side Hybrid Aggregation: SFLG admits a flexible grouping, where the server can control the trade-off between sequential (SFLV2, SL-like) and parallel (SFLV1, FL-like) aggregation via group assignments. The server aggregates subgroup weights using a weighted average:
where is the sample count in group and the overall total.
- Scalability Tuning: By adjusting the number of groups (and correspondingly the server-side subnetwork copies), one can balance system scalability and learning stability; fewer groups scale to more clients, while more closely resembles standard FL.
B. Communication Overhead Reduction via Architectural Tuning
- Pooling Layer Tuning: Downsampling the intermediate activation at the client using pooling layers before the cut point reduces the volume of "smashed data." The post-pooling output size is given by
where is input size, the kernel, the stride. The ratio ("factor") directly quantifies communication cost reduction. Empirically, communication can be cut by up to 4× with negligible or even positive impact on final accuracy if the server-side network is adapted accordingly.
4. Mathematical Formulation and Resource-Accuracy Tradeoffs
The framework formalizes model aggregation and communication-volume reduction mathematically:
| Purpose | Formula |
|---|---|
| Pooling output size | |
| Communication reduction | |
| SFLG aggregation |
- Architectural Tuning: Lower factor yields lower data transfer; achieved by increasing pooling size or stride.
- Aggregation Rule: Generalizes both full parallel and full sequential SFL, parameterized by server resource allocation (number of copies, group formation).
5. Real-World Implications for IoT and Edge Systems
Optimization yields the following practical gains:
- Scalability: SFLG can operate with thousands to millions of clients by flexibly adjusting server resource allocation, avoiding a combinatorial explosion in parameter copies.
- Device Practicality: Lightweight client subnetworks preserve feasible power/thermal budgets, giving longevity to battery-operated IoT devices.
- Bandwidth Adaptability: By tuning pooling layers/programatically narrowing activations, SFL and SFLG can function robustly under constrained or variable network bandwidth, key for remote or rural IoT scenarios.
- Performance Consistency: SFLG achieves balanced/competitive accuracy under both imbalanced and highly non-IID client data settings, blending the strengths of FL and SL. Thus, system designers may tune for communication, compute, and data non-uniformity as deployment parameters dictate.
- Forward Compatibility: The same principles enable SFLG’s adoption in evolving networking environments (e.g., 5G/6G), leveraging server resources as throughput increases.
6. Implementation and Engineering Considerations
- Resource Requirements: SFLG requires minimal client compute (first few layers), but server resource burden is tunable via group strategy.
- Deployment Strategy: System integrators can favor sequential, parallel, or grouped server operation depending on available cluster resources and application-specific convergence/accuracy constraints.
- Potential Limitations: SFLG may require careful server memory management if too many parallel server-side subnets are instantiated. Pooling strategies must be tailored to avoid loss of semantically essential features in the client-side activations.
7. Summary
The optimization framework for SFL, as specified in the referenced paper, centers on a generalized SFLG architecture that merges sequential and parallel processing at the server, allowing fine-grained trade-off tuning among scalability, accuracy, and resource usage. Communication bottlenecks are mitigated by explicit architectural tuning of client-side pooling layers, empirically showing order-of-magnitude reductions in overhead. Empirical results on physical IoT devices (Raspberry Pi) confirm that SFLG achieves learning efficacy and resource efficiency in both balanced and highly heterogeneous data distributions. This positions the framework as a key enabler for large-scale, real-world, resource-sensitive distributed ML in future edge and IoT deployments (Gao et al., 2021).