Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning (2404.06230v1)
Abstract: Federated learning (FL) has been introduced to enable a large number of clients, possibly mobile devices, to collaborate on generating a generalized machine learning model thanks to utilizing a larger number of local samples without sharing to offer certain privacy to collaborating clients. However, due to the participation of a large number of clients, it is often difficult to profile and verify each client, which leads to a security threat that malicious participants may hamper the accuracy of the trained model by conveying poisoned models during the training. Hence, the aggregation framework at the parameter server also needs to minimize the detrimental effects of these malicious clients. A plethora of attack and defence strategies have been analyzed in the literature. However, often the Byzantine problem is analyzed solely from the outlier detection perspective, being oblivious to the topology of neural networks (NNs). In the scope of this work, we argue that by extracting certain side information specific to the NN topology, one can design stronger attacks. Hence, inspired by the sparse neural networks, we introduce a hybrid sparse Byzantine attack that is composed of two parts: one exhibiting a sparse nature and attacking only certain NN locations with higher sensitivity, and the other being more silent but accumulating over time, where each ideally targets a different type of defence mechanism, and together they form a strong but imperceptible attack. Finally, we show through extensive simulations that the proposed hybrid Byzantine attack is effective against 8 different defence methods.
- E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in AISTATS, 2020.
- G. Baruch, M. Baruch, and Y. Goldberg, “A little is enough: Circumventing defenses for distributed learning,” in NeurIPS, 2019.
- J. Bernstein, J. Zhao, K. Azizzadenesheli, and A. Anandkumar, “signsgd with majority vote is communication efficient and fault tolerant,” in ICLR, 2019.
- A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing federated learning through an adversarial lens,” in ICML, 2019.
- B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vector machines,” in ICML, 2012.
- P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent,” in NIPS, 2017.
- X. Cao, M. Fang, J. Liu, and N. Z. Gong, “Fltrust: Byzantine-robust federated learning via trust bootstrapping,” arXiv preprint arXiv:2012.13995, 2020.
- X. Chen, T. Chen, H. Sun, S. Z. Wu, and M. Hong, “Distributed training with heterogeneous data: Bridging median-and mean-based algorithms,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 616–21 626, 2020.
- P. de Jorge, A. Sanyal, H. Behl, P. Torr, G. Rogez, and P. K. Dokania, “Progressive skeletonization: Trimming more fat from a network at initialization,” in International Conference on Learning Representations, 2021.
- E.-M. El-Mhamdi, R. Guerraoui, A. Guirguis, L. N. Hoang, and S. Rouault, “Genuinely distributed byzantine machine learning,” ser. PODC ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 355–364. [Online]. Available: https://doi.org/10.1145/3382734.3405695
- E.-M. El-Mhamdi, R. Guerraoui, and S. Rouault, “Distributed momentum for byzantine-resilient stochastic gradient descent,” in ICLR, 2021.
- E. M. El Mhamdi, R. Guerraoui, and S. Rouault, “The hidden vulnerability of distributed learning in byzantium,” in ICML, 2018.
- U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen, “Rigging the lottery: Making all tickets winners,” in Proceedings of the 37th International Conference on Machine Learning, 2020.
- M. Fang, X. Cao, J. Jia, and N. Z. Gong, “Local model poisoning attacks to byzantine-robust federated learning,” in USENIX Conference on Security Symposium, 2020.
- S. Farhadkhani, R. Guerraoui, N. Gupta, R. Pinot, and J. Stephan, “Byzantine machine learning made easy by resilient averaging of momentums,” in International Conference on Machine Learning. PMLR, 2022, pp. 6246–6283.
- J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks,” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=rJl-b3RcF7
- C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated learning in sybil settings.” in Usenix RAID, 2020.
- E. Gorbunov, S. Horváth, P. Richtárik, and G. Gidel, “Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top,” 2022.
- R. Guerraoui, N. Gupta, R. Pinot, S. Rouault, and J. Stephan, “Differential privacy and byzantine resilience in sgd: Do they add up?” ser. PODC’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 391–401. [Online]. Available: https://doi.org/10.1145/3465084.3467919
- A. Gupta, T. Luo, M. V. Ngo, and S. K. Das, “Long-short history of gradients is all you need: Detecting malicious and unreliable clients in federated learning.” Berlin, Heidelberg: Springer-Verlag, 2022, p. 445–465. [Online]. Available: https://doi.org/10.1007/978-3-031-17143-7_22
- N. Gupta and N. H. Vaidya, “Fault-tolerance in distributed optimization: The case of redundancy,” in Proceedings of the 39th Symposium on Principles of Distributed Computing, ser. PODC ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 365–374. [Online]. Available: https://doi.org/10.1145/3382734.3405748
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
- R. Jin, Y. Huang, X. He, H. Dai, and T. Wu, “Stochastic-sign sgd for federated learning with theoretical guarantees,” arXiv preprint arXiv:2002.10940, 2020.
- S. P. Karimireddy, L. He, and M. Jaggi, “Learning from history for byzantine robust optimization,” in ICML, 2021.
- ——, “Byzantine-robust learning on heterogeneous datasets via bucketing,” in ICLR, 2022.
- J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” NIPS workshop on Private Multiparty Machine Learning, 2016.
- A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research).”
- Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/
- N. Lee, T. Ajanthan, and P. Torr, “SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY,” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=B1VZqjAcYX
- Y. Liu, C. Chen, L. Lyu, F. Wu, S. Wu, and G. Chen, “Byzantine-robust learning on heterogeneous data via gradient splitting,” in Proceedings of the 40th International Conference on Machine Learning, ser. ICML’23. JMLR.org, 2023.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in AISTATS, 2017.
- T. Minka, “Estimating a dirichlet distribution,” 2000.
- L. Muñoz-González, K. T. Co, and E. C. Lupu, “Byzantine-robust federated machine learning through adaptive model averaging,” arXiv preprint arXiv:1909.05125, 2019.
- J. Peng, W. Li, and Q. Ling, “Variance reduction-boosted byzantine robustness in decentralized stochastic optimization,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 4283–4287.
- J. Peng, Z. Wu, and Q. Ling, “Byzantine-robust variance-reduced federated learning over distributed non-i.i.d. data,” CoRR, vol. abs/2009.08161, 2020. [Online]. Available: https://arxiv.org/abs/2009.08161
- K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,” IEEE Transactions on Signal Processing, 2022.
- M. Raynal, D. Pasquini, and C. Troncoso, “Can decentralized learning be more robust than federated learning?” arXiv preprint arXiv:2303.03829, 2023.
- M. Raynal and C. Troncoso, “On the conflict of robustness and learning in collaborative machine learning,” arXiv preprint arXiv:2402.13700, 2024.
- V. Shejwalkar and A. Houmansadr, “Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning,” in NDSS, 2021.
- V. Shejwalkar, A. Houmansadr, P. Kairouz, and D. Ramage, “Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning,” in IEEE Symposium on Security and Privacy, 2022.
- Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really backdoor federated learning?” arXiv preprint arXiv:1911.07963, 2019.
- H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 6377–6389.
- K. Varma, Y. Zhou, N. Baracaldo, and A. Anwar, “Legato: A layerwise gradient aggregation algorithm for mitigating byzantine attacks in federated learning,” in 2021 IEEE 14th international conference on cloud computing (CLOUD). IEEE, 2021, pp. 272–277.
- C. P. Wan and Q. Chen, “Robust federated learning with attack-adaptive aggregation,” arXiv preprint arXiv:2102.05257, 2021.
- Z. Wu, Q. Ling, T. Chen, and G. B. Giannakis, “Federated variance-reduced stochastic gradient descent with robustness to byzantine attacks,” IEEE Transactions on Signal Processing, vol. 68, pp. 4583–4596, 2020.
- H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” ArXiv, 2017.
- C. Xie, K. Huang, P.-Y. Chen, and B. Li, “Dba: Distributed backdoor attacks against federated learning,” in ICLR, 2020.
- C. Xie, O. Koyejo, and I. Gupta, “Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation,” in Uncertainty in Artificial Intelligence, 2020.
- C. Xie, S. Koyejo, and I. Gupta, “Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance,” in International Conference on Machine Learning. PMLR, 2019, pp. 6893–6901.
- W. Xie, T. Pethick, A. Ramezani-Kebrya, and V. Cevher, “Mixed nash for robust federated learning,” Transactions on Machine Learning Research, 2023.
- C. Yin and Q. Zeng, “Defending against data poisoning attack in federated learning with non-iid data,” IEEE Transactions on Computational Social Systems, pp. 1–13, 2023.
- D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” in ICML, 2018.
- H. Zhu and Q. Ling, “Byzantine-robust aggregation with gradient difference compression and stochastic variance reduction for federated learning,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 4278–4282.
- K. Özfatura, E. Özfatura, A. Küpçü, and D. Gunduz, “Byzantines can also learn from history: Fall of centered clipping in federated learning,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 2010–2022, 2024.