Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

pfl-research: simulation framework for accelerating research in Private Federated Learning (2404.06430v2)

Published 9 Apr 2024 in cs.LG, cs.AI, cs.CR, and cs.CV

Abstract: Federated learning (FL) is an emerging ML training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants. Researchers commonly perform experiments in a simulation environment to quickly iterate on ideas. However, existing open-source tools do not offer the efficiency required to simulate FL on larger and more realistic FL datasets. We introduce pfl-research, a fast, modular, and easy-to-use Python framework for simulating FL. It supports TensorFlow, PyTorch, and non-neural network models, and is tightly integrated with state-of-the-art privacy algorithms. We study the speed of open-source FL frameworks and show that pfl-research is 7-72$\times$ faster than alternative open-source frameworks on common cross-device setups. Such speedup will significantly boost the productivity of the FL research community and enable testing hypotheses on realistic FL datasets that were previously too resource intensive. We release a suite of benchmarks that evaluates an algorithm's overall performance on a diverse set of realistic scenarios. The code is available on GitHub at https://github.com/apple/pfl-research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (86)
  1. Health insurance portability and accountability act of 1996. https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html, 1996.
  2. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). https://eur-lex.europa.eu/eli/reg/2016/679/oj, 2016.
  3. https://fl-icml2023.github.io/, 2023. Accessed: 2023-02-20.
  4. Google scholar search results for "federated learning". https://scholar.google.com/scholar?q=%22federated+learning%22&hl=en&as_sdt=0%2C5&as_ylo=2023&as_yhi=2023, 2023. Accessed on 2024-02-03; over 34,200 articles mentioned "Federated Learning" in 2023, exponential growth since 2018.
  5. {{\{{TensorFlow}}\}}: a system for {{\{{Large-Scale}}\}} machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 265–283, 2016.
  6. Differentially private learning with adaptive clipping. Advances in Neural Information Processing Systems, 34:17455–17466, 2021.
  7. Apple Machine Learning Research. Scenes with differential privacy. https://machinelearning.apple.com/research/scenes-differential-privacy, 2023. Accessed: 2024-02-28.
  8. Importance of smoothness induced by optimizers in fl4asr: Towards understanding federated learning for end-to-end asr. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 1–8. IEEE, 2023.
  9. Federated learning for speech recognition: Revisiting current trends towards large-scale asr. In International Workshop on Federated Learning in the Age of Foundation Models in Conjunction with NeurIPS 2023, 2023.
  10. Training a tokenizer for free with private federated learning. arXiv preprint arXiv:2203.09943, 2022.
  11. Secure single-server aggregation with (poly) logarithmic overhead. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 1253–1269, 2020.
  12. Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390, 2020.
  13. When the curious abandon honesty: Federated learning is not private. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pages 175–199. IEEE, 2023.
  14. Towards federated learning at scale: System design. Proceedings of machine learning and systems, 1:374–388, 2019.
  15. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
  16. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
  17. Fedeval: A holistic evaluation framework for federated learning. arXiv preprint arXiv:2011.09655, 2020.
  18. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.
  19. pfl-bench: A comprehensive benchmark for personalized federated learning. Advances in Neural Information Processing Systems, 35:9344–9360, 2022.
  20. (amplified) banded matrix factorization: A unified approach to private training. Advances in Neural Information Processing Systems, 36, 2023.
  21. Emnist: Extending mnist to handwritten letters. In 2017 international joint conference on neural networks (IJCNN), pages 2921–2926. IEEE, 2017.
  22. Connect the dots: Tighter discrete approximations of privacy loss distributions. arXiv preprint arXiv:2207.04380, 2022.
  23. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25, pages 486–503. Springer, 2006.
  24. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006.
  25. Scalable federated machine learning with fedn. In 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pages 555–564. IEEE, 2022.
  26. Differential privacy for deep and federated learning: A survey. IEEE access, 10:22359–22380, 2022.
  27. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  28. Flute: A scalable, extensible framework for high-performance federated learning simulations. arXiv preprint arXiv:2203.13789, 2022.
  29. Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems, 33:16937–16947, 2020.
  30. Google LLC. Tensorflow federated. https://www.tensorflow.org/federated, 2023. Accessed: [insert date here].
  31. Google LLC. Tensorflow privacy. https://github.com/tensorflow/privacy, 2023. Accessed: 2024-02-20.
  32. Numerical composition of differential privacy, 2021.
  33. Improving on-device speaker verification using federated learning with privacy. arXiv preprint arXiv:2008.02651, 2020.
  34. Fedml: A research library and benchmark for federated machine learning. arXiv preprint arXiv:2007.13518, 2020.
  35. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  36. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  37. The oarf benchmark suite: Characterization and implications for federated learning systems. ACM Transactions on Intelligent Systems and Technology (TIST), 13(4):1–32, 2022.
  38. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
  39. Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pages 5132–5143. PMLR, 2020.
  40. Population expansion for training language models with private federated learning. arXiv preprint arXiv:2307.07477, 2023.
  41. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527, 2016.
  42. Openassistant conversations-democratizing large language model alignment. Advances in Neural Information Processing Systems, 36, 2023.
  43. Learning rate adaptation for federated and differentially private learning. arXiv preprint arXiv:1809.03832, 2018.
  44. Learning multiple layers of features from tiny images. 2009.
  45. Fedscale: Benchmarking model and system performance of federated learning at scale. In International Conference on Machine Learning, pages 11814–11827. PMLR, 2022.
  46. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  47. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020.
  48. Flbench: A benchmark suite for federated learning. In Intelligent Computing and Block Chain: First BenchCouncil International Federated Conferences, FICC 2020, Qingdao, China, October 30–November 3, 2020, Revised Selected Papers 1, pages 166–176. Springer, 2021.
  49. Language-guided transformer for federated multi-label classification. arXiv preprint arXiv:2312.07165, 2023.
  50. Unifed: A benchmark for federated learning frameworks. arXiv preprint arXiv:2207.10308, 2022.
  51. Fate: An industrial grade platform for collaborative learning with data protection. The Journal of Machine Learning Research, 22(1):10320–10325, 2021.
  52. Ibm federated learning: an enterprise framework white paper v0. 1. arXiv preprint arXiv:2007.10987, 2020.
  53. Benchmark for personalized federated learning. IEEE Open Journal of the Computer Society, 2023.
  54. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  55. Learning differentially private recurrent language models. In International Conference on Learning Representations, 2018.
  56. Tight on budget? tight bounds for r-fold approximate differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 247–264, 2018.
  57. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE symposium on security and privacy (SP), pages 691–706. IEEE, 2019.
  58. Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pages 263–275. IEEE, 2017.
  59. Local and central differential privacy for robustness and privacy in federated learning. arXiv preprint arXiv:2009.03561, 2020.
  60. Asynchronous federated learning with bidirectional quantized communications and buffered aggregation. arXiv preprint arXiv:2308.00263, 2023.
  61. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  62. Federated evaluation and tuning for on-device personalization: System design & applications. arXiv preprint arXiv:2102.08503, 2021.
  63. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
  64. Openfl: An open-source framework for federated learning. arXiv preprint arXiv:2105.06413, 2021.
  65. John A. Rice. Mathematical Statistics and Data Analysis. Duxbury Press, 3 edition, 2006. Chapter 5 - Central Limit Theorem.
  66. Enforcing fairness in private federated learning via the modified method of differential multipliers. arXiv preprint arXiv:2109.08604, 2021.
  67. Nvidia flare: Federated learning from simulation to real-world. arXiv preprint arXiv:2210.13291, 2022.
  68. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799, 2018.
  69. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017.
  70. Differentially private federated few-shot image classification. In ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML, 2023.
  71. Aya dataset: An open-access collection for multilingual instruction tuning. arXiv preprint arXiv:2402.06619, 2024.
  72. Flair: Federated learning annotated image repository. Advances in Neural Information Processing Systems, 35:37792–37805, 2022.
  73. Stack Overflow. Stack Overflow Data (BigQuery Dataset). https://www.kaggle.com/datasets/stackoverflow/stackoverflow, 2023. Accessed: 2024-03-15.
  74. Samplable anonymous aggregation for private federated data analysis. arXiv preprint arXiv:2307.15017, 2023.
  75. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  76. Flamby: Datasets and benchmarks for cross-silo federated learning in realistic healthcare settings. arXiv preprint arXiv:2210.04620, 2022.
  77. A field guide to federated optimization. arXiv preprint arXiv:2107.06917, 2021.
  78. Motley: Benchmarking heterogeneity and personalization in federated learning. arXiv preprint arXiv:2206.09262, 2022.
  79. Federatedscope: A flexible federated learning platform for heterogeneity. arXiv preprint arXiv:2204.05011, 2022.
  80. Training large-vocabulary neural language models by private federated learning for resource-constrained devices. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  81. Federated learning of gboard language models with differential privacy. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), 2023.
  82. Opacus: User-friendly differential privacy library in pytorch. arXiv preprint arXiv:2109.12298, 2021.
  83. Momentum approximation in asynchronous private federated learning. arXiv preprint arXiv:2402.09247, 2024.
  84. Tinyllama: An open-source small language model. arXiv preprint arXiv:2401.02385, 2024.
  85. Poission subsampled rényi differential privacy. In International Conference on Machine Learning, pages 7634–7642. PMLR, 2019.
  86. Pysyft: A library for easy federated learning. Federated Learning Systems: Towards Next-Generation AI, pages 111–139, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Filip Granqvist (7 papers)
  2. Congzheng Song (23 papers)
  3. Áine Cahill (6 papers)
  4. Rogier van Dalen (14 papers)
  5. Martin Pelikan (9 papers)
  6. Yi Sheng Chan (2 papers)
  7. Xiaojun Feng (3 papers)
  8. Natarajan Krishnaswami (2 papers)
  9. Vojta Jina (3 papers)
  10. Mona Chitnis (5 papers)
Citations (4)

Summary

Introduction to pfl-research: A High-Speed Framework for Federated Learning Simulation

Overview of pfl-research

Federated Learning (FL) represents a paradigm shift in training machine learning models across multitudes of devices while preserving data privacy. Despite its promise, the field has been challenged by the computational resources required for simulating realistic FL setups. Addressing this bottleneck, the introduction of pfl-research, a Python framework designed for simulating FL and Private Federated Learning (PFL) effortlessly and efficiently, marks a significant advancement. This framework achieves a remarkable speed enhancement, is 7-72 times faster than existing simulators in typical use cases, supports a wide range of machine learning models, and is seamlessly integrated with cutting-edge privacy-preserving algorithms.

Key Contributions of the Framework

  • Speed Improvement: pfl-research significantly accelerates the simulation of FL, enabling research and experimentation on more extensive and realistic datasets with reduced resource requirements.
  • Ease of Distributed Simulations: The framework facilitates a smooth transition to distributed simulations, enhancing productivity and simplifying the researcher's workflow.
  • Comprehensive Privacy Features: With built-in state-of-the-art privacy mechanisms, pfl-research allows for rigorous experimentation with PFL, ensuring user privacy without compromising the utility of the models.
  • Support for Various Models: Beyond neural networks, the framework accommodates a range of model types, broadening the scope of FL research.
  • Benchmark Suite: A suite of benchmarks is provided, allowing researchers to evaluate their algorithms across diverse scenarios accurately.

pfl-research Architecture

pfl-research's architecture promotes modularity and flexibility, ensuring researchers can plug in different models, algorithms, and privacy techniques as needed. It simplifies the simulation process without sacrificing the realism of FL setups, offering support for both PyTorch and TensorFlow frameworks. Through its distributed simulation design, pfl-research removes unnecessary communication overhead, enabling efficient use of computational resources.

Performance and Benchmarking

Benchmarking studies reveal pfl-research's superior performance compared to other FL simulation frameworks. Speed tests on both small-scale (CIFAR10 IID) and large-scale (FLAIR) datasets demonstrate the framework's capability to drastically reduce the wall-clock time for simulations while maintaining or improving the accuracy of the results. For FLAIR, pfl-research outstrips TensorFlow Federated and Flower by significant margins, showcasing its efficiency in handling complex simulations with high computational demands.

Implications and Future Directions

The release of pfl-research is poised to accelerate the pace of FL research by making simulations more accessible and practical. Its performance advantages, coupled with the comprehensive suite of features, empower researchers to explore a broader spectrum of hypotheses and contribute to the continual advancement of FL technologies.

The framework's open-source nature invites collaboration and expansion, with opportunities for the community to integrate new algorithms, datasets, and privacy mechanisms. The authors also highlight planned enhancements, including the expansion of benchmark suites to cover TensorFlow implementations and cross-silo FL scenarios.

In conclusion, pfl-research stands out as a versatile, powerful tool for FL research, addressing core challenges in simulation speed and framework capabilities. Its development reflects the growing need for efficient, scalable solutions in federated learning, marking a step forward in the realization of privacy-preserving, decentralized machine learning models.

X Twitter Logo Streamline Icon: https://streamlinehq.com