Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FedSDD: Scalable and Diversity-enhanced Distillation for Model Aggregation in Federated Learning (2312.17029v1)

Published 28 Dec 2023 in cs.LG

Abstract: Recently, innovative model aggregation methods based on knowledge distillation (KD) have been proposed for federated learning (FL). These methods not only improved the robustness of model aggregation over heterogeneous learning environment, but also allowed training heterogeneous models on client devices. However, the scalability of existing methods is not satisfactory, because the training cost on the server increases with the number of clients, which limits their application in large scale systems. Furthermore, the ensemble of existing methods is built from a set of client models initialized from the same checkpoint, causing low diversity. In this paper, we propose a scalable and diversity-enhanced federated distillation scheme, FedSDD, which decouples the training complexity from the number of clients to enhance the scalability, and builds the ensemble from a set of aggregated models with enhanced diversity. In particular, the teacher model in FedSDD is an ensemble built by a small group of aggregated (global) models, instead of all client models, such that the computation cost will not scale with the number of clients. Furthermore, to enhance diversity, FedSDD only performs KD to enhance one of the global models, i.e., the \textit{main global model}, which improves the performance of both the ensemble and the main global model. While partitioning client model into more groups allow building an ensemble with more aggregated models, the convergence of individual aggregated models will be slow down. We introduce the temporal ensembling which leverage the issues, and provide significant improvement with the heterogeneous settings. Experiment results show that FedSDD outperforms other FL methods, including FedAvg and FedDF, on the benchmark datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Large scale distributed neural network training through online distillation. arXiv preprint arXiv:1804.03235, 2018.
  2. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
  3. Online knowledge distillation with diverse peers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3430–3437, 2020.
  4. Fedbe: Making bayesian model ensemble applicable to federated learning. In ICLR, 2021.
  5. Heterogeneous ensemble knowledge transfer for training large models in federated learning. arXiv preprint arXiv:2204.12703, 2022.
  6. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  7. Ensemble attention distillation for privacy-preserving federated learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15076–15086, 2021.
  8. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  9. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015.
  10. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
  11. Snapshot ensembles: Train 1, get m for free. In International Conference on Learning Representations.
  12. Learn from others and be yourself in heterogeneous federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10143–10153, 2022.
  13. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479, 2018.
  14. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pages 5132–5143. PMLR, 2020.
  15. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  16. Learning multiple layers of features from tiny images. 2009.
  17. S. Laine and T. Aila. Temporal ensembling for semi-supervised learning. In International Conference on Learning Representations.
  18. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2:429–450, 2020.
  19. Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
  20. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  21. Fed-ensemble: Improving generalization through model ensembling in federated learning. arXiv preprint arXiv:2107.10663, 2021.
  22. A field guide to federated optimization. arXiv preprint arXiv:2107.06917, 2021.
  23. Local-global knowledge distillation in heterogeneous federated learning with non-iid data. arXiv preprint arXiv:2107.00051, 2021.
  24. S. Zagoruyko and N. Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  25. Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10174–10183, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.