Papers
Topics
Authors
Recent
2000 character limit reached

FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs (2306.04959v5)

Published 8 Jun 2023 in cs.CR and cs.AI

Abstract: This paper introduces FedSecurity, an end-to-end benchmark that serves as a supplementary component of the FedML library for simulating adversarial attacks and corresponding defense mechanisms in Federated Learning (FL). FedSecurity eliminates the need for implementing the fundamental FL procedures, e.g., FL training and data loading, from scratch, thus enables users to focus on developing their own attack and defense strategies. It contains two key components, including FedAttacker that conducts a variety of attacks during FL training, and FedDefender that implements defensive mechanisms to counteract these attacks. FedSecurity has the following features: i) It offers extensive customization options to accommodate a broad range of machine learning models (e.g., Logistic Regression, ResNet, and GAN) and FL optimizers (e.g., FedAVG, FedOPT, and FedNOVA); ii) it enables exploring the effectiveness of attacks and defenses across different datasets and models; and iii) it supports flexible configuration and customization through a configuration file and some APIs. We further demonstrate FedSecurity's utility and adaptability through federated training of LLMs to showcase its potential on a wide range of complex applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (102)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  2. Luca Antiga. Introducing pytorch lightning 2.0 and fabric. https://lightning.ai/blog/introducing-lightning-2-0/, 2023.
  3. How to backdoor federated learning. In International Conference on Artificial Intelligence and Statistics, pages 2938–2948. PMLR, 2020.
  4. Amos Beimel. Secret-sharing schemes: A survey. In International conference on coding and cryptology, pages 11–46. Springer, 2011.
  5. Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390, 2020.
  6. Analyzing federated learning through an adversarial lens. In International Conference on Machine Learning, pages 634–643. PMLR, 2019.
  7. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373, 2023.
  8. Machine learning with adversaries: Byzantine tolerant gradient descent. Advances in neural information processing systems, 30, 2017.
  9. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  10. Proof-of-contribution-based design for collaborative machine learning on blockchain. In IEEE International Conference on Decentralized Applications and Infrastructures (IEEE DAPPS 2023), July 2023.
  11. Differentially private secure multi-party computation for federated learning in financial applications. In Proceedings of the First ACM International Conference on AI in Finance, pages 1–9, 2020.
  12. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
  13. Feddef: Defense against gradient leakage in federated learning-based network intrusion detection systems. IEEE Transactions on Information Forensics and Security, 18:4561–4576, 2022.
  14. Federated learning of out-of-vocabulary words. arXiv preprint arXiv:1903.10635, 2019.
  15. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. ACM on Measurement and Analysis of Computing Systems, 1(2):1–25, Dec 2017.
  16. A review of medical federated learning: Applications in oncology and cancer research. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021, Virtual Event, September 27, 2021, Revised Selected Papers, Part I, pages 3–24. Springer, 2022.
  17. Revealing and protecting labels in distributed training. Advances in Neural Information Processing Systems, 34:1727–1738, 2021.
  18. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  19. Flute: A scalable, extensible framework for high-performance federated learning simulations. arXiv preprint arXiv:2203.13789, 2022.
  20. Federated analytics: A survey. APSIPA Transactions on Signal and Information Processing, 12(1), 2023.
  21. Local model poisoning attacks to {{\{{Byzantine-Robust}}\}} federated learning. In 29th USENIX security symposium (USENIX Security 20), pages 1605–1622, 2020.
  22. Wikimedia Foundation. Wikimedia downloads.
  23. Attack-resistant federated learning with residual-based reweighting. arXiv preprint arXiv:1912.11464, 2019.
  24. Mitigating sybils in federated learning poisoning. Aug 2018. Available on arXiv:1808.04866.
  25. The limitations of federated learning in sybil settings. In RAID, pages 301–316, 2020.
  26. Design patterns: elements of reusable object-oriented software. Pearson Deutschland GmbH, 1995.
  27. The pile: An 800gb dataset of diverse text for language modeling, 2020.
  28. Generative adversarial nets. In NIPS, 2014.
  29. The hidden vulnerability of distributed learning in byzantium. In International Conference on Machine Learning, pages 3521–3530. PMLR, 2018.
  30. Sylvain Gugger. Introducing hugging face accelerate. https://huggingface.co/blog/accelerate-library, 2021.
  31. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018.
  32. Group knowledge transfer: Federated learning of large cnns at the edge. Advances in Neural Information Processing Systems, 33:14068–14080, 2020.
  33. FedML: A research library and benchmark for federated machine learning. arXiv preprint arXiv:2007.13518, 2020.
  34. Fednas: Federated deep learning via neural architecture search. 2021.
  35. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
  36. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  37. Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 603–618, 2017.
  38. FedML Inc. Code of conduct. https://github.com/FedML-AI/FedML/blob/master/CODE_OF_CONDUCT.md, 2022.
  39. FedML Inc. FedMLSecurity experiments. https://github.com/FedML-AI/FedML/tree/master/python/examples/security/fedMLSecurity_experiments, 2022.
  40. FedML Inc. Differential privacy in FedML. https://github.com/FedML-AI/FedML/tree/master/python/fedml/core/dp, 2023.
  41. FedML Inc. FedML & Theta launch a decentralized ai supercluster for generative ai and content recommendation. https://blog.fedml.ai/fedml-theta-launch-a-decentralized-ai-supercluster-for-generative-ai-and-content-recommendation, 2023.
  42. FedML Inc. Releasing FedLLM: Build your own large language models on proprietary data using the FedML platform. https://blog.fedml.ai/releasing-fedllm-build-your-own-large-language-models-on-proprietary-data-using-the-fedml-platform, 2023.
  43. FedML Inc. Sample configurations for attacks. https://github.com/FedML-AI/FedML/tree/master/python/examples/security/mqtt_s3_fedavg_attack_mnist_lr_example, 2023.
  44. FedML Inc. Sample configurations for defenses. https://github.com/FedML-AI/FedML/tree/master/python/examples/security/mqtt_s3_fedavg_defense_mnist_lr_example, 2023.
  45. Pubmedqa: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, 2019.
  46. Cafe: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems, 34:994–1006, 2021.
  47. Synchronous parallel processing of big-data analytics services to optimize performance in federated clouds. 2012 IEEE Fifth International Conference on Cloud Computing, pages 811–818, 2012.
  48. Byzantine-robust learning on heterogeneous datasets via bucketing. arXiv preprint arXiv:2006.09365, 2020.
  49. Cocktail party attack: Breaking aggregation-based privacy in federated learning using independent component analysis. In International Conference on Machine Learning, 2022.
  50. Learning multiple layers of features from tiny images. 2009.
  51. Fedclean: A defense mechanism against parameter poisoning attacks in federated learning. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4333–4337, 2022.
  52. Baybfed: Bayesian backdoor defense for federated learning. arXiv preprint arXiv:2301.09508, 2023.
  53. FedScale: Benchmarking model and system performance of federated learning at scale. In International Conference on Machine Learning, pages 11814–11827. PMLR, 2022.
  54. Gradient disaggregation: Breaking privacy in federated learning by reconstructing the user participant matrix. In International Conference on Machine Learning, pages 5959–5968. PMLR, 2021.
  55. Ken Lang. Newsweeder: Learning to filter netnews. In Armand Prieditis and Stuart Russell, editors, Machine Learning Proceedings 1995, pages 331–339. Morgan Kaufmann, San Francisco (CA), 1995.
  56. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
  57. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  58. Federated learning for keyword spotting. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6341–6345, 2019.
  59. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020.
  60. Lomar: A local defense against poisoning attack on federated learning. IEEE Transactions on Dependable and Secure Computing, 20:437–450, 2022.
  61. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503–528, 1989.
  62. Fate: An industrial grade platform for collaborative learning with data protection. The Journal of Machine Learning Research, 22(1):10320–10325, 2021.
  63. IBM Federated Learning: An Enterprise Framework White Paper v0.1. arXiv preprint arXiv:2007.10987, 2020.
  64. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), 09 2022. bbac409.
  65. Feature inference attack on model predictions in vertical federated learning. In IEEE International Conference on Data Engineering (ICDE), pages 181–192. IEEE, 2021.
  66. Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning. IEEE Transactions on Information Forensics and Security, 17:1639–1654, 2022.
  67. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  68. Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics, 2016.
  69. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE symposium on security and privacy (SP), pages 691–706. IEEE, 2019.
  70. Defending against backdoors in federated learning with robust learning rate. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9268–9276, 2021.
  71. Robust aggregation for federated learning. IEEE Transactions on Signal Processing, 70:1142–1154, 2022.
  72. F. Pukelsheim. The three sigma rule. The American Statistician, 48(2):88–91, May 1994.
  73. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16. IEEE, 2020.
  74. Daniel Ramage. Federated analytics: Collaborative data science without data collection. Google AI Blog, May 2020.
  75. Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329, 2019.
  76. Fedgan: Federated generative adversarial networks for distributed data. arXiv preprint arXiv:2006.07228, 2020.
  77. Adaptive federated optimization. In International Conference on Learning Representations, 2021.
  78. Openfl: An open-source framework for federated learning. arXiv preprint arXiv:2105.06413, 2021.
  79. NVIDIA FLARE: Federated learning from simulation to real-world. arXiv preprint arXiv:2210.13291, 2022.
  80. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
  81. Imagenet large scale visual recognition challenge, 2015.
  82. Shakespeare. The complete works of william shakespeare by william shakespeare, Jan 1994.
  83. Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning. In NDSS, 2021.
  84. Fed-BioMed: A General Open-Source Frontend Framework for Federated Learning in Healthcare. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning: Second MICCAI Workshop, pages 201–210. Springer, 2020.
  85. Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963, 2019.
  86. Data poisoning attacks against federated learning systems. In European Symposium on Research in Computer Security, pages 480–501. Springer, 2020.
  87. Model poisoning attacks against distributed machine learning systems. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, volume 11006, pages 481–489. SPIE, 2019.
  88. Federated analytics: Opportunities and challenges. IEEE Network, 36:151–158, 2022.
  89. Zero++: Extremely efficient collective communication for giant model training. arXiv preprint arXiv:2306.10209, 2023.
  90. Attack of the tails: Yes, you really can backdoor federated learning. In NeurIPS, Dec 2020.
  91. Jianhua Wang. Pass: Parameters audit-based secure and fair federated learning scheme against free rider. arXiv preprint arXiv:2207.07292, 2022.
  92. Tackling the objective inconsistency problem in heterogeneous federated optimization. ArXiv, abs/2007.07481, 2020.
  93. CRFL: Certifiably robust federated learning against backdoor attacks. In International Conference on Machine Learning, pages 11372–11382. PMLR, 2021.
  94. Dba: Distributed backdoor attacks against federated learning. In International conference on learning representations, 2019.
  95. SLSGD: Secure and Efficient Distributed On-device Machine Learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 213–228. Springer, 2020.
  96. FederatedScope: A Flexible Federated Learning Platform for Heterogeneity. arXiv preprint arXiv:2204.05011, 2022.
  97. Byzantine-resilient stochastic gradient descent for distributed learning: A Lipschitz-inspired coordinate-wise median approach. In IEEE CDC, Dec 2019.
  98. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning, pages 5650–5659. PMLR, 2018.
  99. Neurotoxin: Durable backdoors in federated learning. In International Conference on Machine Learning, 2022.
  100. Deep leakage from gradients. Advances in Neural Information Processing Systems, 32, 2019.
  101. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015.
  102. PySyft: A library for easy federated learning. Federated Learning Systems: Towards Next-Generation AI, pages 111–139, 2021.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.