- The paper introduces SubFLOT, a framework that uses optimal transport for personalized submodel extraction and aggregation in federated learning.
- It integrates three modules—OTP, SAR, and OTA—to align submodels and mitigate impacts of non-IID data and system heterogeneity.
- Empirical results show over 50% reduction in communication and computation costs while maintaining model accuracy under high sparsity.
SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport
Motivation and Problem Setting
Federated Learning (FL) suffers from two intertwined forms of heterogeneity: system heterogeneity (client resource diversity) and statistical heterogeneity (non-IID data distributions). Conventional federated pruning approaches have reached a bottleneck: server-side pruning lacks personalization due to data-agnostic decisions, while client-side pruning incurs excessive computation on resource-limited devices. Compounding these challenges, pruning itself can amplify parameter-space divergence, destabilizing global model convergence. The paper introduces SubFLOT, which systematically resolves these issues by proposing personalized submodel extraction and aggregation, leveraging optimal transport (OT) theory in the parameter space.
SubFLOT Framework
SubFLOT consists of three synergistic modules: Optimal Transport-enhanced Pruning (OTP), Scaling-based Adaptive Regularization (SAR), and OT-enhanced Aggregation (OTA). The workflow initiates server-side submodel personalization based on historical client models, proceeds with adaptive local training modulated by pruning-induced divergence, and concludes with aggregation that aligns heterogeneous client submodels in the global parameter space.
Figure 1: An overview of the SubFLOT pipeline emphasizing OTP for server-side personalized pruning, SAR for adaptive local regularization, and OTA for OT-based heterogeneous aggregation.
OTP: Layer-wise Server-Side Personalized Pruning
The OTP module employs layer-wise OT to align global model parameters with historical client-specific models, circumventing the need for raw data. It decomposes the pruning task into tractable layer-wise Wasserstein minimization problems, producing transport plans that yield aligned submodels. This enables the creation of heterogeneous, resource-adaptive architectures for each client.
SAR: Adaptive Control of Pruning-induced Divergence
SAR introduces an adaptive regularization term into the local training objective, proportional to the client’s pruning rate. This constraint stabilizes highly pruned submodels, suppressing excessive parametric drift relative to the global anchor, and ensures optimizable updates for aggregation.
OTA: OT-based Model Aggregation
OTA leverages OT to map updated client models into a canonical global parameter space, mitigating neuron permutation mismatches and magnitude discrepancies from disparate pruning rates. Aggregation occurs after alignment, improving cross-client semantic consistency and global convergence.
Theoretical Analysis
SubFLOT’s convergence is formalized under standard FL assumptions including strong convexity, smoothness, bounded gradient variance, and bounded OT perturbations. The paper demonstrates linear convergence to a neighborhood of the global optimum, with explicit dependence of the error floor on statistical and system heterogeneity, OT alignment, and SAR regularization strength.
Empirical Evaluation
Label and Feature Skew Robustness
SubFLOT exhibits pronounced performance superiority across pathological and practical label-skew settings and in feature-shift scenarios on digit and image domain benchmarks, outstripping all compared federated pruning baselines. Its competitive accuracy under resource constraints rivals full-model personalized FL paradigms.
Scalability and Sparsity
The framework demonstrates robust scalability as client population increases, maintaining accuracy where baselines degrade. SubFLOT’s resilience persists under rising sparsity levels, preserving accuracy and stability even at aggressive pruning rates.
Figure 2: SubFLOT accuracy under varying pruning rates on CIFAR-10, evidencing robustness even at high sparsity.
Resource Efficiency
SubFLOT reduces communication and computation costs by over 50% compared to FedAvg, with minimal server-side OTP latency due to layer-wise parallelization. It realizes efficient, personalized FL on edge devices without full-model burden.
Figure 3: Wall-clock time to 200 rounds and 80% accuracy on CIFAR-10; OT overhead is amortized by accelerated convergence.
Feature Alignment
Visualization via Grad-CAM reveals that SubFLOT-generated submodels retain task-relevant local attention signatures, closely matching historical client models. Baseline pruning strategies exhibit fragmented or noisy feature maps, underscoring SubFLOT’s superiority in semantic preservation.
Figure 4: Feature visualization for SubFLOT across domains, demonstrating preservation of client-specific feature patterns.
Figure 5: Activation map comparison for SubFLOT and baselines, evidencing SubFLOT’s feature alignment and adaptation.
Hyperparameter Sensitivity
SubFLOT’s personalization-stability trade-off is controlled by α in OTP/OTA, and SAR regularization is modulated by λ. Empirical analysis indicates optimal accuracy with moderate values, with robustness to heterogeneity maintained through careful tuning.
Figure 6: Hyperparameter sensitivity analysis of α and λ, isolating their effects on personalization and divergence control.
Ablation Studies
Removal or replacement of any core module—OTP, SAR, or OTA—leads to tangible performance loss, with OTA misalignment resulting in the sharpest degradation. SubFLOT’s layered design is validated as essential for managing parameter-space and feature-space heterogeneity.
Implications and Future Directions
SubFLOT establishes OT as a unified geometric tool for both personalized pruning and aggregation in federated settings, enabling resource-adaptive, semantically-aligned local models without privacy trade-offs. The approach significantly reduces edge-device overhead while achieving competitive personalization, providing a scalable blueprint for practical FL deployments across resource-limited, non-IID environments. Further research may extend OT-based alignment to more complex architectures (e.g., transformers), adaptive pruning schedules, and cross-modal FL scenarios. Theoretically, integrating more explicit distribution matching in the parameter space could further tighten convergence and stability guarantees.
Conclusion
SubFLOT advances federated learning by resolving the tension between efficiency and personalization, harmonizing heterogeneous client models through optimal transport and adaptive regularization. Its theoretical and empirical advantages, coupled with resource efficiency and semantic feature preservation, designate SubFLOT as a leading paradigm for practical, personalized FL on edge devices (2604.06631).