Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FedScale: Benchmarking Model and System Performance of Federated Learning at Scale (2105.11367v5)

Published 24 May 2021 in cs.LG, cs.AI, cs.DC, and cs.PF

Abstract: We present FedScale, a federated learning (FL) benchmarking suite with realistic datasets and a scalable runtime to enable reproducible FL research. FedScale datasets encompass a wide range of critical FL tasks, ranging from image classification and object detection to LLMing and speech recognition. Each dataset comes with a unified evaluation protocol using real-world data splits and evaluation metrics. To reproduce realistic FL behavior, FedScale contains a scalable and extensible runtime. It provides high-level APIs to implement FL algorithms, deploy them at scale across diverse hardware and software backends, and evaluate them at scale, all with minimal developer efforts. We combine the two to perform systematic benchmarking experiments and highlight potential opportunities for heterogeneity-aware co-optimizations in FL. FedScale is open-source and actively maintained by contributors from different institutions at http://fedscale.ai. We welcome feedback and contributions from the community.

Citations (162)

Summary

  • The paper presents FedScale, a comprehensive FL framework that benchmarks model and system performance using diverse, real-world datasets.
  • It employs real non-IID data partitions and simulates device heterogeneity to assess key metrics like communication overhead and synchronization delays.
  • The study reveals insights for optimizing federated algorithms by balancing performance, privacy, and computation in practical, scaled deployments.

FedScale: Benchmarking Model and System Performance of Federated Learning at Scale

Federated Learning (FL) is a decentralized machine learning paradigm designed to collaboratively train models while preserving user privacy by ensuring that data remains on local devices. Despite the growing interest in FL, there exists a significant gap in standardized benchmarking tools tailored to evaluate FL settings comprehensively. FedScale, introduced in this paper, aims to address this challenge by providing an extensive benchmarking suite for FL, incorporating diverse data sets and a scalable runtime framework.

Key Components of FedScale

FedScale comprises two primary components: a collection of realistic FL datasets and a scalable runtime environment, FedScale Runtime, enabling effective evaluation of FL algorithms across various scenarios.

Realistic Federated Datasets

The authors have curated 20 datasets covering critical FL tasks such as image classification, object detection, LLMing, and speech recognition. These datasets are derived from authentic real-world data, offering realistic partitions, non-IID distributions, and client-specific characteristics. This diversity facilitates evaluations that mirror practical federated applications, overcoming the limitations of existing benchmarks which often rely on synthetically generated data from traditional machine learning benchmarks.

FedScale Runtime: Scalable Evaluation Platform

The runtime environment is designed to reproduce realistic FL behaviors, such as device heterogeneity, network connectivity variability, and client availability dynamics. This system is implemented with high-level APIs allowing minimal developer intervention to deploy and evaluate FL algorithms at scale. FedScale Runtime supports both mobile and cluster environments, providing detailed metrics on communication and computation efficiencies, which are crucial for assessing the feasibility of FL deployments.

Systematic Evaluation and Insights

FedScale facilitates comprehensive evaluations uncovering hidden opportunities in FL optimizations:

  • Statistical Efficiency: FedScale evaluates FL algorithms like FedAvg, FedProx, and FedYoGi against realistic datasets, revealing the algorithms' robustness to non-IID data and demonstrating varying preferences depending on specific task requirements.
  • System Efficiency: The platform benchmarks practical FL runtime and execution costs, such as communication overhead and client-server synchronization, enabling analyses of performance bottlenecks and identifying opportunities for co-optimization of communication and computation resources.
  • Privacy and Security: FedScale analyses FL strategies for differential privacy and security enhancements, providing insights into the trade-offs between privacy guarantees and model accuracy within realistic FL settings.

Implications and Future Directions

FedScale serves as an essential tool for research in FL, providing a standardized suite for evaluating the practical viability of FL algorithms over diverse applications. It sets the stage for future work focusing on the co-optimization of system and statistical efficiencies, addressing biased performance across clients, and enhancing privacy and security measures. With FL's application on edge devices and its inherent heterogeneity, FedScale opens new avenues for optimizing federated models considering device constraints.

Conclusion

FedScale is an open-source, actively maintained platform designed to standardize federated learning evaluations, fostering reproducible research in the field. By offering realistic datasets and an extensible runtime environment, FedScale represents a significant advancement in FL benchmarking capabilities, encouraging the development of robust federated solutions adaptable to varied real-world scenarios. The authors emphasize the importance of community contributions and feedback to continually expand and refine the suite, ensuring relevance to emerging FL applications.