SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport

Published 8 Apr 2026 in cs.LG, cs.AI, and cs.CV | (2604.06631v1)

Abstract: Federated Learning (FL) enables collaborative model training while preserving data privacy, but its practical deployment is hampered by system and statistical heterogeneity. While federated network pruning offers a path to mitigate these issues, existing methods face a critical dilemma: server-side pruning lacks personalization, whereas client-side pruning is computationally prohibitive for resource-constrained devices. Furthermore, the pruning process itself induces significant parametric divergence among heterogeneous submodels, destabilizing training and hindering global convergence. To address these challenges, we propose SubFLOT, a novel framework for server-side personalized federated pruning. SubFLOT introduces an Optimal Transport-enhanced Pruning (OTP) module that treats historical client models as proxies for local data distributions, formulating the pruning task as a Wasserstein distance minimization problem to generate customized submodels without accessing raw data. Concurrently, to counteract parametric divergence, our Scaling-based Adaptive Regularization (SAR) module adaptively penalizes a submodel's deviation from the global model, with the penalty's strength scaled by the client's pruning rate. Comprehensive experiments demonstrate that SubFLOT consistently and substantially outperforms state-of-the-art methods, underscoring its potential for deploying efficient and personalized models on resource-constrained edge devices.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces SubFLOT, a framework that uses optimal transport for personalized submodel extraction and aggregation in federated learning.
It integrates three modules—OTP, SAR, and OTA—to align submodels and mitigate impacts of non-IID data and system heterogeneity.
Empirical results show over 50% reduction in communication and computation costs while maintaining model accuracy under high sparsity.

SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport

Motivation and Problem Setting

Federated Learning (FL) suffers from two intertwined forms of heterogeneity: system heterogeneity (client resource diversity) and statistical heterogeneity (non-IID data distributions). Conventional federated pruning approaches have reached a bottleneck: server-side pruning lacks personalization due to data-agnostic decisions, while client-side pruning incurs excessive computation on resource-limited devices. Compounding these challenges, pruning itself can amplify parameter-space divergence, destabilizing global model convergence. The paper introduces SubFLOT, which systematically resolves these issues by proposing personalized submodel extraction and aggregation, leveraging optimal transport (OT) theory in the parameter space.

SubFLOT Framework

SubFLOT consists of three synergistic modules: Optimal Transport-enhanced Pruning (OTP), Scaling-based Adaptive Regularization (SAR), and OT-enhanced Aggregation (OTA). The workflow initiates server-side submodel personalization based on historical client models, proceeds with adaptive local training modulated by pruning-induced divergence, and concludes with aggregation that aligns heterogeneous client submodels in the global parameter space.

Figure 1: An overview of the SubFLOT pipeline emphasizing OTP for server-side personalized pruning, SAR for adaptive local regularization, and OTA for OT-based heterogeneous aggregation.

OTP: Layer-wise Server-Side Personalized Pruning

The OTP module employs layer-wise OT to align global model parameters with historical client-specific models, circumventing the need for raw data. It decomposes the pruning task into tractable layer-wise Wasserstein minimization problems, producing transport plans that yield aligned submodels. This enables the creation of heterogeneous, resource-adaptive architectures for each client.

SAR: Adaptive Control of Pruning-induced Divergence

SAR introduces an adaptive regularization term into the local training objective, proportional to the client’s pruning rate. This constraint stabilizes highly pruned submodels, suppressing excessive parametric drift relative to the global anchor, and ensures optimizable updates for aggregation.

OTA: OT-based Model Aggregation

OTA leverages OT to map updated client models into a canonical global parameter space, mitigating neuron permutation mismatches and magnitude discrepancies from disparate pruning rates. Aggregation occurs after alignment, improving cross-client semantic consistency and global convergence.

Theoretical Analysis

SubFLOT’s convergence is formalized under standard FL assumptions including strong convexity, smoothness, bounded gradient variance, and bounded OT perturbations. The paper demonstrates linear convergence to a neighborhood of the global optimum, with explicit dependence of the error floor on statistical and system heterogeneity, OT alignment, and SAR regularization strength.

Empirical Evaluation

Label and Feature Skew Robustness

SubFLOT exhibits pronounced performance superiority across pathological and practical label-skew settings and in feature-shift scenarios on digit and image domain benchmarks, outstripping all compared federated pruning baselines. Its competitive accuracy under resource constraints rivals full-model personalized FL paradigms.

Scalability and Sparsity

The framework demonstrates robust scalability as client population increases, maintaining accuracy where baselines degrade. SubFLOT’s resilience persists under rising sparsity levels, preserving accuracy and stability even at aggressive pruning rates.

Figure 2: SubFLOT accuracy under varying pruning rates on CIFAR-10, evidencing robustness even at high sparsity.

Resource Efficiency

SubFLOT reduces communication and computation costs by over 50% compared to FedAvg, with minimal server-side OTP latency due to layer-wise parallelization. It realizes efficient, personalized FL on edge devices without full-model burden.

Figure 3: Wall-clock time to 200 rounds and 80% accuracy on CIFAR-10; OT overhead is amortized by accelerated convergence.

Feature Alignment

Visualization via Grad-CAM reveals that SubFLOT-generated submodels retain task-relevant local attention signatures, closely matching historical client models. Baseline pruning strategies exhibit fragmented or noisy feature maps, underscoring SubFLOT’s superiority in semantic preservation.

Figure 4: Feature visualization for SubFLOT across domains, demonstrating preservation of client-specific feature patterns.

Figure 5: Activation map comparison for SubFLOT and baselines, evidencing SubFLOT’s feature alignment and adaptation.

Hyperparameter Sensitivity

SubFLOT’s personalization-stability trade-off is controlled by $\alpha$ in OTP/OTA, and SAR regularization is modulated by $\lambda$ . Empirical analysis indicates optimal accuracy with moderate values, with robustness to heterogeneity maintained through careful tuning.

Figure 6: Hyperparameter sensitivity analysis of $\alpha$ and $\lambda$ , isolating their effects on personalization and divergence control.

Ablation Studies

Removal or replacement of any core module—OTP, SAR, or OTA—leads to tangible performance loss, with OTA misalignment resulting in the sharpest degradation. SubFLOT’s layered design is validated as essential for managing parameter-space and feature-space heterogeneity.

Implications and Future Directions

SubFLOT establishes OT as a unified geometric tool for both personalized pruning and aggregation in federated settings, enabling resource-adaptive, semantically-aligned local models without privacy trade-offs. The approach significantly reduces edge-device overhead while achieving competitive personalization, providing a scalable blueprint for practical FL deployments across resource-limited, non-IID environments. Further research may extend OT-based alignment to more complex architectures (e.g., transformers), adaptive pruning schedules, and cross-modal FL scenarios. Theoretically, integrating more explicit distribution matching in the parameter space could further tighten convergence and stability guarantees.

Conclusion

SubFLOT advances federated learning by resolving the tension between efficiency and personalization, harmonizing heterogeneous client models through optimal transport and adaptive regularization. Its theoretical and empirical advantages, coupled with resource efficiency and semantic feature preservation, designate SubFLOT as a leading paradigm for practical, personalized FL on edge devices (2604.06631).

Markdown Report Issue