Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FetchSGD: Communication-Efficient Federated Learning with Sketching (2007.07682v2)

Published 15 Jul 2020 in cs.LG and stat.ML

Abstract: Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch. This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Daniel Rothchild (11 papers)
  2. Ashwinee Panda (19 papers)
  3. Enayat Ullah (15 papers)
  4. Nikita Ivkin (12 papers)
  5. Ion Stoica (177 papers)
  6. Vladimir Braverman (99 papers)
  7. Joseph Gonzalez (35 papers)
  8. Raman Arora (46 papers)
Citations (348)

Summary

  • The paper introduces FetchSGD, which compresses client updates using Count Sketch to significantly reduce communication costs in federated learning.
  • It maintains stateless clients and robust performance across non-i.i.d. datasets, addressing major challenges in decentralized training.
  • Empirical results on tasks like image classification and language modeling demonstrate that FetchSGD outperforms traditional methods such as FedAvg.

An Overview of FetchSGD: Communication-Efficient Federated Learning with Sketching

Federated learning represents a paradigm in machine learning where training data is distributed across numerous edge devices, such as smartphones or home assistants, which do not share their sensitive data with central servers. While this approach garners significant interest due to its advantages in ensuring data privacy and reducing cloud-based storage costs, it poses profound challenges, primarily related to communication bottlenecks and model convergence speed. The paper "FetchSGD: Communication-Efficient Federated Learning with Sketching" tackles these challenges by introducing a novel algorithm, FetchSGD, leveraging sketching techniques to achieve communication efficiency and satisfactory convergence guarantees in federated learning environments.

Key Contributions and Approach

FetchSGD is devised to overcome three central constraints in federated learning: the need for communication efficiency, the requirement for stateless clients, and the inherently non-i.i.d. nature of data distributed across devices. Traditional federated learning methods often suffer from excessive communication costs as clients exchange full model updates with the server. FetchSGD addresses this by employing the Count Sketch, a data structure that allows for significant compression of data updates, thereby minimizing the information exchanged between clients and the central aggregator.

The core innovation in FetchSGD is the use of the Count Sketch for compressing model updates from clients before they are aggregated centrally. The sketch's linearity enables the convenient aggregation of updates while allowing the efficient handling of momentum and error accumulation centrally. This architectural choice mitigates the inefficiencies related to sparse client participation—a common issue in federated learning environments.

Moreover, FetchSGD maintains client statelessness, meaning that no client needs to store state information between training rounds. This feature is particularly advantageous because it fits the typical federated learning scenario where clients engage only sporadically. Through rigorous theoretical analysis, the paper establishes that FetchSGD not only guarantees convergence but also does so under settings involving non-i.i.d. data distributions, which are characteristic of decentralized datasets.

Empirical Evaluation

FetchSGD's empirical performance is illustrated through experiments involving several large-scale models, including residual networks and transformer models, trained on tasks such as image classification and LLMing. The results demonstrate FetchSGD's prowess in achieving effective model training with high communication efficiency compared to prevailing methods like Federated Averaging (FedAvg) and gradient sparsification techniques. Particularly, FetchSGD manifests robust performance in scenarios with small local datasets or highly heterogeneous data—contexts where FedAvg and related methods struggle.

Implications and Future Directions

The implications of this research extend both theoretically and practically. Theoretically, FetchSGD provides a promising direction for communication-efficient optimization in federated learning, augmenting the field's understanding of sketching methods and their potential to facilitate scalable machine learning. Practically, the algorithm's ability to efficiently utilize sparse and non-i.i.d. data aligns well with the real-world deployment scenarios of federated learning across diverse consumer devices, where data variability is the norm.

Future work in this space may explore integrating FetchSGD with complementary strategies designed to reduce the number of communication rounds, thereby laying the groundwork for federated learning systems that are both communication and round-efficient. The linearity of the Count Sketch also opens avenues for exploring other linear compression techniques, potentially enhancing FetchSGD's applicability to even broader machine learning and optimization contexts. Overall, FetchSGD marks a significant stride in resolving some of the age-old bottlenecks inhibiting broader adoption of federated learning architectures.