Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction (2212.01548v2)

Published 3 Dec 2022 in cs.LG, cs.CR, cs.CV, and cs.DC

Abstract: Most cross-device federated learning (FL) studies focus on the model-homogeneous setting where the global server model and local client models are identical. However, such constraint not only excludes low-end clients who would otherwise make unique contributions to model training but also restrains clients from training large models due to on-device resource bottlenecks. In this work, we propose FedRolex, a partial training (PT)-based approach that enables model-heterogeneous FL and can train a global server model larger than the largest client model. At its core, FedRolex employs a rolling sub-model extraction scheme that allows different parts of the global server model to be evenly trained, which mitigates the client drift induced by the inconsistency between individual client models and server model architectures. We show that FedRolex outperforms state-of-the-art PT-based model-heterogeneous FL methods (e.g. Federated Dropout) and reduces the gap between model-heterogeneous and model-homogeneous FL, especially under the large-model large-dataset regime. In addition, we provide theoretical statistical analysis on its advantage over Federated Dropout and evaluate FedRolex on an emulated real-world device distribution to show that FedRolex can enhance the inclusiveness of FL and boost the performance of low-end devices that would otherwise not benefit from FL. Our code is available at: https://github.com/AIoT-MLSys-Lab/FedRolex

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Samiul Alam (15 papers)
  2. Luyang Liu (20 papers)
  3. Ming Yan (190 papers)
  4. Mi Zhang (85 papers)
Citations (113)

Summary

FedRolex: Enhancements in Model-Heterogeneous Federated Learning

The paper "FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction" addresses significant limitations in traditional federated learning (FL) systems that require homogeneity among client and server models. The authors introduce a novel approach named FedRolex to facilitate model-heterogeneous federated learning, enabling the global server model to be larger than any individual client model while mitigating issues such as client drift.

Problem Context

Traditional cross-device federated learning often necessitates identical models across both server and clients, which imposes restrictions on the size of models that clients can handle due to device heterogeneity and computational constraints. This uniformity excludes clients with resource-constrained devices from contributing to FL systems, despite potentially valuable local data, and limits the ability of FL to train more robust, larger models.

FedRolex Methodology and Contributions

FedRolex is a partial training (PT) technique employing a rolling sub-model extraction strategy. The primary elements of FedRolex include:

  • Rolling Sub-Model Extraction: This method extracts different sub-models from a global model in a rolling fashion across multiple communication rounds. Each client trains a sub-model tailored to its resource capacity. By iteratively advancing a rolling window for sub-model extraction, the method ensures that all global model parameters are uniformly trained, reducing the risk of client drift due to model heterogeneity.
  • Selective Aggregation: FedRolex uses a selective averaging scheme to aggregate the updates received from clients. It averages parameter updates received from any clients that used that parameter, maintaining model synchronization without needing client weighting, which simplifies implementation.

The authors also conduct a theoretical statistical analysis comparing FedRolex to existing methods and demonstrate that FedRolex can train a full global model more efficiently than Federated Dropout by ensuring balanced training of model parameters.

Experimental Evaluation

FedRolex was evaluated against state-of-the-art model-heterogeneous FL methods (e.g., Federated Dropout, HeteroFL) as well as some knowledge distillation (KD) methods (e.g., FedDF). The experiments were conducted across different regimes: small-model with small datasets (CIFAR-10, CIFAR-100) and the large-model with large datasets (Stack Overflow). The evaluation highlights include:

  • Enhanced Performance: FedRolex outperformed other PT-based methods consistently across datasets under both low and high heterogeneity scenarios. Particularly, under conditions of high data heterogeneity, FedRolex demonstrated superior performance compared to random sub-model extraction methods like Federated Dropout.
  • Improvement Over Model-Homogeneous FL: FedRolex narrows the performance gap between model-heterogeneous and homogeneous settings. This is significant, especially for large-scale datasets, indicating that FedRolex can leverage large client models to enhance global model accuracy efficiently.
  • Support for Large Server Models: FedRolex can train server models larger than the largest client model without the need for public data, unlike many KD-based methods, offering improved compatibility with privacy-preserving strategies.
  • Robust to Real-World Device Distributions: The algorithm was tested on a modeled real-world distribution of devices, showing enhanced inclusiveness and performance than low-capacity, homogeneous FL.

Implications and Future Directions

FedRolex represents a significant step in making federated learning more flexible and accessible across heterogeneous devices. Its ability to handle model heterogeneity and train large-scale models without compromising model accuracy or privacy is particularly valuable for edge devices and scenarios requiring high-dimensional modeling. Future research could explore convergence analysis and deployment strategies for global models trained under the FedRolex schema to maximize its on-device performance benefits.