Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Function Space Aggregation for Federated Learning at Scale (2311.10291v2)

Published 17 Nov 2023 in cs.LG

Abstract: The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Federated learning via posterior averaging: A new perspective and practical algorithms. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=GFsU8a0sGB.
  2. Advances in asynchronous parallel and distributed optimization. Proceedings of the IEEE, 108(11):2013–2031, 2020.
  3. The TensorFlow Federated Authors. TensorFlow Federated Stack Overflow dataset, 2019. URL https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/stackoverflow/load_data.
  4. Measuring and regularizing networks in function space. In ICLR, 2018.
  5. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
  6. Convergence and accuracy trade-offs in federated learning and meta-learning. In International Conference on Artificial Intelligence and Statistics, pp.  2575–2583. PMLR, 2021.
  7. Iterated vector fields and conservatism, with applications to federated learning. In International Conference on Algorithmic Learning Theory, pp.  130–147. PMLR, 2022.
  8. Towards federated foundation models: Scalable dataset pipelines for group-structured learning. arXiv preprint arXiv:2307.09619, 2023.
  9. Emnist: Extending mnist to handwritten letters. 2017 International Joint Conference on Neural Networks (IJCNN), 2017. doi: 10.1109/ijcnn.2017.7966217.
  10. Fedavg with fine tuning: Local updates lead to representation learning. Advances in Neural Information Processing Systems, 35:10572–10586, 2022.
  11. Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
  12. Implicit gradient alignment in distributed and federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  6454–6462, 2022.
  13. Efficient parametric approximations of neural network function space distance. In International Conference on Machine Learning, pp.  7795–7812. PMLR, 2023.
  14. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel. Advances in Neural Information Processing Systems, 33:5850–5861, 2020.
  15. Linear mode connectivity and the lottery ticket hypothesis. In International Conference on Machine Learning, pp.  3259–3269. PMLR, 2020.
  16. Fisher-legendre (fishleg) optimization of deep neural networks. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=c9lAOPvQHS.
  17. Why (and when) does local sgd generalize better than sgd? arXiv preprint arXiv:2303.01215, 2023.
  18. One-shot federated learning. arXiv preprint arXiv:1902.11175, 2019.
  19. Federated learning as variational inference: A scalable expectation propagation approach. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=dZrQR7OR11.
  20. news-please: A generic news crawler and extractor. In Proceedings of the 15th International Symposium of Information Science, pp.  218–223, March 2017. doi: 10.5281/zenodo.4120316.
  21. Fedchain: Chained algorithms for near-optimal communication cost in federated learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=ZaVVVlcdaN.
  22. Towards a theoretical and practical understanding of one-shot federated learning with Fisher information. In Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities, 2023.
  23. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
  24. Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pp.  5132–5143. PMLR, 2020.
  25. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  26. Cifar-100 (canadian institute for advanced research). URL http://www.cs.toronto.edu/~kriz/cifar.html.
  27. On noisy evaluation in federated hyperparameter tuning. Proceedings of Machine Learning and Systems, 5, 2023.
  28. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
  29. Branch-train-merge: Embarrassingly parallel training of expert language models. arXiv preprint arXiv:2208.03306, 2022.
  30. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020.
  31. From local SGD to local fixed-point methods for federated learning. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  6692–6701. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/malinovskiy20a.html.
  32. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pp.  2408–2417. PMLR, 2015.
  33. Merging models with Fisher-weighted averaging. Advances in Neural Information Processing Systems, 35:17703–17716, 2022.
  34. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  35. Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. Advances in Neural Information Processing Systems, 34:14606–14619, 2021.
  36. FedSplit: An algorithmic framework for fast federated optimization. Advances in neural information processing systems, 33:7057–7066, 2020.
  37. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  38. Model ratatouille: Recycling diverse models for out-of-distribution generalization. 2023.
  39. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
  40. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  41. A field guide to federated optimization. arXiv preprint arXiv:2107.06917, 2021.
  42. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pp.  23965–23998. PMLR, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.