Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding (2402.01340v1)

Published 2 Feb 2024 in cs.LG, cs.CR, and eess.SP

Abstract: Distributed learning is an effective approach to accelerate model training using multiple workers. However, substantial communication delays emerge between workers and a parameter server due to massive costs associated with communicating gradients. SignSGD with majority voting (signSGD-MV) is a simple yet effective optimizer that reduces communication costs through one-bit quantization, yet the convergence rates considerably decrease as adversarial workers increase. In this paper, we show that the convergence rate is invariant as the number of adversarial workers increases, provided that the number of adversarial workers is smaller than that of benign workers. The key idea showing this counter-intuitive result is our novel signSGD with federated defense (signSGD-FD). Unlike the traditional approaches, signSGD-FD exploits the gradient information sent by adversarial workers with the proper weights, which are obtained through gradient sign decoding. Experimental results demonstrate signSGD-FD achieves superior convergence rates over traditional algorithms in various adversarial attack scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021, 2017.
  2. QSGD: Communication-efficient SGD via gradient quantization and encoding. In Advances in Neural Information Processsing Systems (NeurIPS), volume 30, pp.  1709–1720, Long Beach, CA, 2017.
  3. Byzantine stochastic gradient descent. In Advances in Neural Information Processsing Systems (NeurIPS), volume 31, Montréal, Canada, 2018.
  4. How to backdoor federated learning. In Proceedings of the 23rd International Conference on Artificial Intelligenec and Statistics (AISTATS), volume 108, pp. 2938–2948, Online, 2020.
  5. A little is enough: Circumventing defenses for distributed learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, Vancouver, Canada, 2019.
  6. Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations. In Advances in Neural Information Processsing Systems (NeurIPS), volume 32, pp.  14695–14706, Vancouver, Canada, 2019.
  7. A finite sample analysis of the naive bayes classifier. Journal of Machine Learning Research, 16(1):1519–1545, 2015.
  8. signSGD: Compressed optimisation for non-convex problems. In Proceedings of the 35th International Conference on Machine Learning (ICML), pp.  560–569, Stockholm, Sweden, 2018.
  9. signSGD with majority vote is communication efficient and fault tolerant. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, 2019.
  10. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processsing Systems, volume 30, Long Beach, CA, 2017.
  11. Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT), pp.  177–186, Paris, France, 2010.
  12. Distributed training with heterogeneous data: Bridging median-and mean-based algorithms. In Advances in Neural Information Processsing Systems (NeurIPS), volume 33, pp.  21616–21626, Online, 2020.
  13. Large scale distributed deep networks. In Advances in Neural Information Processing Systems (NIPS), volume 25, pp.  1223–1231, Lake Tahoe, NV, 2012.
  14. vqSGD: Vector quantized stochastic gradient descent. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), pp.  2197–2205, Online, 2021.
  15. The hidden vulnerability of distributed learning in Byzantium. In Proceedings of the 35th International Conference on Machine Learning (ICML), pp.  3521–3530, Stockholm, Sweden, 2018.
  16. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, Las Vegas, NV, 2016.
  17. A weighted minimum distance decoding for uplink multiuser MIMO systems with low-resolution ADCs. IEEE Transactions on Communications, 66(5):1912–1924, 2017.
  18. DAdaQuant: Doubly-adaptive quantization for communication-efficient federated learning. In Proceedings of the 39th International Conference on Machine Learning (ICML), pp.  8852–8866, Baltimore, MD, 2022.
  19. One-bit sphere decoding for uplink massive MIMO systems with one-bit ADCs. IEEE Transactions on Wireless Communications, 17(7):4509–4521, 2018.
  20. Stochastic-sign SGD for federated learning with theoretical guarantees. arXiv preprint arXiv:2002.10940, 2020.
  21. Sign-based gradient descent with heterogeneous data: Convergence and Byzantine resilience. IEEE Transactions on Neural Networks and Learning Systems, 2024. Early Access.
  22. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
  23. Error feedback fixes signSGD and other gradient compression schemes. In Proceedings of the 36th International Conference on Machine Learning (ICML), pp.  3252–3261, Long Beach, CA, 2019.
  24. Learning from history for Byzantine robust optimization. In Proceedings of the 38th International Conference on Machine Learning (ICML), pp.  5311–5319, Online, 2021.
  25. Large deviation methods for approximate probabilistic inference. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI), Madison, WI, 1998.
  26. Supervised-learning for multi-hop MU-MIMO communications with one-bit transceivers. IEEE Journals on Selected Areas in Communications, 37(11):2559–2572, 2019.
  27. A worker-task specialization model for crowdsourcing: Efficient inference and fundamental limits. IEEE Transactions on Information Theory, 2023a. Early Access.
  28. Distributed boosting classification over noisy communication channels. IEEE Journals on Selected Areas in Communications, 41(1):141–154, 2023b.
  29. Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, Canada, 2009.
  30. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, 1982.
  31. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  32. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086, 2014.
  33. Near-optimal sparse allreduce for distributed deep learning. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp.  135–149, Seoul, South Korea, 2022.
  34. Analysis of error feedback in federated non-convex optimization with biased compression: Fast convergence and partial participation. In Proceedings of the 40th International Conference on Machine Learning (ICML), pp.  19638–19688, Honolulu, HI, 2023.
  35. On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
  36. Revisiting weighted aggregation in federated learning with neural networks. In Proceedings of the 40th International Conference on Machine Learning (ICML), pp.  19767–19788, Honolulu, HI, 2023.
  37. Threats to federated learning: A survey. arXiv preprint arXiv:2003.02133, 2020.
  38. 𝖲3superscript𝖲3\mathsf{S}^{3}sansserif_S start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPTGD-MV: Sparse-SignSGD with majority vote for communication-efficient distributed learning. In Proceedings of IEEE International Symposium on Information Theory (ISIT), pp.  2266–2271, Taipei, Taiwan, 2023a.
  39. Sparse-SignSGD with majority vote for communication-efficient distributed learning. arXiv preprint arXiv:2302.07475, 2023b.
  40. Robust aggregation for federated learning. IEEE Transactions on Signal Processing, 70:1142–1154, 2022.
  41. FetchSGD: Communication-efficient federated learning with sketching. In Proceedings of the 37th International Conference on Machine Learning (ICML), pp.  8253–8265, Online, 2020.
  42. Robust and communication-efficient federated learning from non-i.i.d. data. IEEE Transactions on Neural Networks and Learning Systems, 31(9):3400–3413, 2019.
  43. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In Proceedings of the 15th Annual Conference of the International Speech Communications Association (INTERSPEECH), pp. 1058–1062, Singapore, 2014.
  44. Election coding for distributed learning: Protecting signSGD against Byzantine attacks. In Advances in Neural Information Processsing Systems (NeurIPS), volume 33, pp.  14615–14625, Online, 2020.
  45. Sparsified SGD with memory. In Advances in Neural Information Processsing Systems (NeurIPS), volume 31, pp.  4448–4459, Montréal, Canada, 2018.
  46. Momentum ensures convergence of signsgd under weaker assumptions. In Proceedings of the 40th International Conference on Machine Learning (ICML), pp.  33077–33099, Honolulu, HI, 2023.
  47. Attack of the tails: Yes, you really can backdoor federated learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp.  16070–16084, Online, 2020.
  48. Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processsing Systems (NeurIPS), volume 31, pp.  1299–1309, Montréal, Canada, 2018.
  49. TernGrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in Neural Information Processsing Systems (NIPS), volume 30, pp.  1–13, Long Beach, CA, 2017.
  50. Fast-convergent federated learning with adaptive weighting. IEEE Transactions on Cognitive Communications and Networking, 7(4):1078–1088, 2021.
  51. Fall of empires: Breaking Byzantine-tolerant SGD by inner product manipulation. In Proceedings of the 35th Uncertainty in Artificial Intelligence Conference (UAI), volume 115, pp.  261–270, Online, 2020.
  52. Byzantine-robust distributed learning: Towards optimal statistical rates. In Proceedings of the 35th International Conference on Machine Learning (ICML), pp.  5650–5659, Stockholm, Sweden, 2018.
  53. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  54. Communication-efficient distributed blockwise momentum SGD with error-feedback. In Advances in Neural Information Processsing Systems (NeurIPS), volume 32, Vancouver, Canada, 2019.
  55. Parallelized stochastic gradient descent. In Advances in Neural Information Processsing Systems (NIPS), volume 23, pp.  2595–2603, Vancouver, Canada, 2010.

Summary

We haven't generated a summary for this paper yet.