Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Federated learning with hierarchical clustering of local updates to improve training on non-IID data (2004.11791v2)

Published 24 Apr 2020 in cs.LG and stat.ML

Abstract: Federated learning (FL) is a well established method for performing machine learning tasks over massively distributed data. However in settings where data is distributed in a non-iid (not independent and identically distributed) fashion -- as is typical in real world situations -- the joint model produced by FL suffers in terms of test set accuracy and/or communication costs compared to training on iid data. We show that learning a single joint model is often not optimal in the presence of certain types of non-iid data. In this work we present a modification to FL by introducing a hierarchical clustering step (FL+HC) to separate clusters of clients by the similarity of their local updates to the global joint model. Once separated, the clusters are trained independently and in parallel on specialised models. We present a robust empirical analysis of the hyperparameters for FL+HC for several iid and non-iid settings. We show how FL+HC allows model training to converge in fewer communication rounds (significantly so under some non-iid settings) compared to FL without clustering. Additionally, FL+HC allows for a greater percentage of clients to reach a target accuracy compared to standard FL. Finally we make suggestions for good default hyperparameters to promote superior performing specialised models without modifying the the underlying federated learning communication protocol.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Christopher Briggs (4 papers)
  2. Zhong Fan (22 papers)
  3. Peter Andras (5 papers)
Citations (471)

Summary

  • The paper introduces FL+HC, a novel federated learning method that uses hierarchical clustering to group similar client updates and address non-IID data challenges.
  • It employs various distance metrics, such as Manhattan and cosine, to partition clients effectively, leading to faster convergence in pathological and label-swapped settings.
  • Experimental results on MNIST and FEMNIST demonstrate improved communication efficiency and model specialization, offering practical benefits for real-world applications.

Federated Learning with Hierarchical Clustering of Local Updates to Improve Training on Non-IID Data

This paper addresses a critical challenge in federated learning (FL) consortia: the non-independent and identically distributed (non-IID) nature of real-world data. It introduces a novel modification to the traditional FL approach by incorporating hierarchical clustering of local updates, yielding FL+HC. This method aims to enhance model performance when faced with the inherent heterogeneity of client data distributions.

Methodology Overview

In FL+HC, hierarchical clustering is used midway through the federated training process to group clients based on the similarity of their local model updates. After clustering, each group of clients — characterized by similar data distributions — is trained on a specialized model. This approach contrasts with typical FL that seeks to produce a single global model, potentially suboptimal due to diverse training data distributions among clients.

The clustering process involves evaluating the local model updates' similarity using various distance metrics (Euclidean, Manhattan, and cosine) and linkage techniques, which organize clusters based on mean, complete or single linkages. To ensure the practicality of this method, a robust empirical analysis explores several hyperparameter configurations.

Experimental Results

The paper evaluates FL+HC on the MNIST dataset across different non-IID settings: pathological non-IID, label-swapped non-IID, and FEMNIST. These configurations simulate real-world FL scenarios where client data distributions vary widely.

  1. Pathological Non-IID: FL+HC exhibited notably faster convergence, requiring fewer communication rounds than standard FL while achieving similar final test accuracies. Employing the Manhattan distance metric led to optimal clustering results, significantly improving the rate at which clients reached the target accuracy.
  2. Label-Swapped Non-IID: Consistently, FL+HC outperformed FL in configurations where clients had conflicting label mappings. Here, the cosine distance metric was most effective in forming clusters aligned with client data peculiarities, thereby enhancing the model's generalization ability.
  3. FEMNIST Non-IID: Results showed marginal improvements, reflecting the complexity inherent in data partitioned by user. While testing varied distance metrics, no significant benefits were observed over traditional FL, indicating challenges in this more realistic and complicated setting.

Implications and Future Directions

The introduction of FL+HC highlights the importance of tailoring model training to the nuances of client data distributions in decentralized learning environments. By strategically clustering clients, the method can facilitate the creation of more accurate, specialized models without modifying the underlying FL protocol.

Theoretical implications suggest that various non-IID data characteristics may require different clustering strategies for optimal performance. Practical implications extend to industries where data privacy is paramount yet distribution is naturally skewed, such as mobile applications or IoT devices.

Future research may focus on integrating FL+HC with privacy-preserving constraints like differential privacy, addressing the noise and compression in client updates. Further scalability tests on larger networks and datasets, alongside investigations into adversarial resilience, could bolster the application potential of FL+HC.

This paper marks a step forward in the quest to reconcile federated learning with the diverse realities of distributed data environments, proposing a sophisticated yet intuitive adjustment to existing protocols.