REPA: Client Clustering without Training and Data Labels for Improved Federated Learning in Non-IID Settings (2309.14088v1)
Abstract: Clustering clients into groups that exhibit relatively homogeneous data distributions represents one of the major means of improving the performance of federated learning (FL) in non-independent and identically distributed (non-IID) data settings. Yet, the applicability of current state-of-the-art approaches remains limited as these approaches cluster clients based on information, such as the evolution of local model parameters, that is only obtainable through actual on-client training. On the other hand, there is a need to make FL models available to clients who are not able to perform the training themselves, as they do not have the processing capabilities required for training, or simply want to use the model without participating in the training. Furthermore, the existing alternative approaches that avert the training still require that individual clients have a sufficient amount of labeled data upon which the clustering is based, essentially assuming that each client is a data annotator. In this paper, we present REPA, an approach to client clustering in non-IID FL settings that requires neither training nor labeled data collection. REPA uses a novel supervised autoencoder-based method to create embeddings that profile a client's underlying data-generating processes without exposing the data to the server and without requiring local training. Our experimental analysis over three different datasets demonstrates that REPA delivers state-of-the-art model performance while expanding the applicability of cluster-based FL to previously uncovered use cases.
- Towards federated learning at scale: System design. In SysML (MLSys). Palo Alto, CA, USA.
- Federated learning with hierarchical clustering of local updates to improve training on non-IID data. In International Joint Conference on Neural Networks (IJCNN). Glasgow, UK.
- Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097.
- Deng, L. 2012. The mnist database of handwritten digit images for machine learning research. IEEE signal processing magazine, 29(6): 141–142.
- Fedgroup: Efficient federated learning via decomposed similarity-based clustering. In 2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). New York City, NY, USA.
- A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD. Portland, OR, USA.
- An efficient framework for clustered federated learning. In NeurIPS. Virtual.
- Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. Journal of biomedical informatics, 99: 103291.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2): 1–210.
- Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto.
- Supervised autoencoders: Improving generalization performance with unsupervised regularizers. In NeurIPS. Montreal, Canada.
- Federated optimization in heterogeneous networks. In MLSys. Austin, TX, USA.
- Lloyd, S. 1982. Least squares quantization in PCM. IEEE transactions on information theory, 28(2): 129–137.
- Multi-center federated learning: clients clustering for better personalization. World Wide Web, 26(1): 481–500.
- Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics (AISTATS). Lauderdale, FL, USA.
- Fedproc: Prototypical contrastive federated learning on non-iid data. Future Generation Computer Systems, 143: 93–104.
- Fedsoft: Soft clustered federated learning with proximal local updating. In AAAI. Virtual.
- Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints. IEEE transactions on neural networks and learning systems, 32(8): 3710–3722.
- Federated learning with non-iid data. arXiv preprint arXiv:1806.00582.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.