Papers
Topics
Authors
Recent
2000 character limit reached

R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

Published 16 Jul 2024 in cs.LG, cs.AI, and eess.SP | (2407.11654v2)

Abstract: Split federated learning (SFL) is a compute-efficient paradigm in distributed ML, where components of large ML models are outsourced to remote servers. A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming that could jeopardize the learning process. This is particularly pronounced for word embedding parameters in LLMs, which are crucial for language understanding. In this paper, rigorous insights are provided into the influence of jamming LLM word embeddings in SFL by deriving an expression for the ML training loss divergence and showing that it is upper-bounded by the mean squared error (MSE). Based on this analysis, a physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks. R-SFLLM leverages wireless sensing data to gather information on the jamming directions-of-arrival (DoAs) for the purpose of devising a novel, sensing-assisted anti-jamming strategy while jointly optimizing beamforming, user scheduling, and resource allocation. Extensive experiments using BERT and RoBERTa models demonstrate R-SFLLM's effectiveness, achieving close-to-baseline performance across various NLP tasks and datasets. The proposed methodology further introduces an adversarial training component, where controlled noise exposure significantly enhances the LLM's resilience to perturbed parameters during training. The results show that more noise-sensitive models, such as RoBERTa, benefit from this feature, especially when resource allocation is unfair. It is also shown that worst-case jamming in particular translates into worst-case model outcomes, thereby necessitating the need for jamming-resilient SFL protocols.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. W. Saad, O. Hashash, C. K. Thomas, C. Chaccour, M. Debbah, N. Mandayam, and Z. Han, “Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey Beyond 6G,” arXiv preprint arXiv:2405.02336, 2024.
  2. W. Saad, M. Bennis, and M. Chen, “A Vision of 6G Wireless Systems: Applications, Trends, Technologies, and Open Research Problems,” IEEE Network, vol. 34, no. 3, pp. 134–142, 2019.
  3. V. Ziegler, P. Schneider, H. Viswanathan, M. Montag, S. Kanugovi, and A. Rezaki, “Security and Trust in the 6G Era,” IEEE Access, vol. 9, pp. 142 314–142 327, 2021.
  4. M. Chen, D. Gündüz, K. Huang, W. Saad, M. Bennis, A. V. Feljan, and H. V. Poor, “Distributed Learning in Wireless Networks: Recent Progress and Future Challenges,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3579–3605, 2021.
  5. A. Yazdinejad, A. Dehghantanha, H. Karimipour, G. Srivastava, and R. M. Parizi, “A Robust Privacy-Preserving Federated Learning Model Against Model Poisoning Attacks,” IEEE Transactions on Information Forensics and Security, pp. 1–1, 2024.
  6. S. Qiu, Q. Liu, S. Zhou, and C. Wu, “Review of Artificial Intelligence Adversarial Attack and Defense Technologies,” Applied Sciences, vol. 9, no. 5, p. 909, 2019.
  7. B. D. Son, N. T. Hoa, T. Van Chien, W. Khalid, M. A. Ferrag, W. Choi, and M. Debbah, “Adversarial Attacks and Defenses in 6G Network-Assisted IoT Systems,” IEEE Internet of Things Journal, 2024.
  8. California State Legislature, “California Consumer Privacy Act of 2018,” https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?division=3.&part=4.&lawCode=CIV&title=1.81.5, 2018.
  9. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Artificial Intelligence and Statistics.   PMLR, 2017, pp. 1273–1282.
  10. C. Thapa, P. C. M. Arachchige, S. Camtepe, and L. Sun, “SplitFed: When Federated Learning Meets Split Learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, 2022, pp. 8485–8493.
  11. X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, and Q. Liu, “TinyBERT: Distilling BERT for Natural Language Understanding,” arXiv preprint arXiv:1909.10351, 2019.
  12. M. Chen, H. V. Poor, W. Saad, and S. Cui, “Wireless Communications for Collaborative Federated Learning,” IEEE Communications Magazine, vol. 58, no. 12, pp. 48–54, 2020.
  13. W. Yang, L. Li, Z. Zhang, X. Ren, X. Sun, and B. He, “Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models,” arXiv preprint arXiv:2103.15543, 2021.
  14. K. Yoo and N. Kwak, “Backdoor Attacks in Federated Learning by Rare Embeddings and Gradient Ensembling,” arXiv preprint arXiv:2204.14017, 2022.
  15. Y. Shi and Y. E. Sagduyu, “Jamming Attacks on Federated Learning in Wireless Networks,” arXiv preprint arXiv:2201.05172, 2022.
  16. R. Ruby, H. Yang, and K. Wu, “Anti-Jamming Strategy for Federated Learning in Internet of Medical Things: A Game Approach,” IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 2, 2023.
  17. G. P. Fettweis and H. Boche, “On 6G and Trustworthiness,” Commun. ACM, vol. 65, no. 4, p. 48–49, Mar 2022.
  18. G. Marti, T. Kölle, and C. Studer, “Mitigating Smart Jammers in Multi-User MIMO,” IEEE Transactions on Signal Processing, vol. 71, pp. 756–771, 2023, arXiv:2208.01453.
  19. G. Marti and C. Studer, “Universal MIMO Jammer Mitigation via Secret Temporal Subspace Embeddings,” May 2023, arXiv:2305.01260.
  20. J. Gao, S. A. Vorobyov, H. Jiang, and H. V. Poor, “Worst-Case Jamming on MIMO Gaussian Channels,” IEEE Transactions on Signal Processing, vol. 63, no. 21, pp. 5821–5836, 2015.
  21. A. Kashyap, T. Basar, and R. Srikant, “Correlated Jamming on MIMO Gaussian Fading Channels,” IEEE Transactions on Information Theory, vol. 50, no. 9, pp. 2119–2123, 2004.
  22. V. C. Andrei, A. Djuhera, X. Li, U. J. Mönich, H. Boche, and W. Saad, “Resilient-By-Design Framework for MIMO-OFDM Communications under Smart Jamming,” IEEE International Conference on Communications, 2024.
  23. W. Yu, W. Rhee, S. Boyd, and J. M. Cioffi, “Iterative Water-Filling for Gaussian Vector Multiple-Access Channels,” IEEE Transactions on Information Theory, vol. 50, no. 1, pp. 145–152, 2004.
  24. N. Jain, P.-y. Chiang, Y. Wen, J. Kirchenbauer, H.-M. Chu, G. Somepalli, B. R. Bartoldson, B. Kailkhura, A. Schwarzschild, A. Saha et al., “NEFTune: Noisy Embeddings Improve Instruction Finetuning,” arXiv preprint arXiv:2310.05914, 2023.
  25. J. Geiping, H. Bauermeister, H. Dröge, and M. Moeller, “Inverting Gradients – How Easy is it to Break Privacy in Federated Learning?” Advances in neural information processing systems, vol. 33, pp. 16 937–16 947, 2020.
  26. Z. Chen, P. Chen, Z. Guo, Y. Zhang, and X. Wang, “A RIS-Based Vehicle DOA Estimation Method With Integrated Sensing and Communication System,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 6, pp. 5554–5566, 2024.
  27. P. Chen, Z. Yang, Z. Chen, and Z. Guo, “Reconfigurable Intelligent Surface Aided Sparse DOA Estimation Method With Non-ULA,” IEEE Signal Processing Letters, vol. 28, pp. 2023–2027, 2021.
  28. C. Chaccour, W. Saad, M. Debbah, and H. V. Poor, “Joint Sensing, Communication, and AI: A Trifecta for Resilient THz User Experiences,” IEEE Transactions on Wireless Communications, 2024.
  29. W. Wei, L. Liu, Y. Wut, G. Su, and A. Iyengar, “Gradient-Leakage Resilient Federated Learning,” in 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS).   IEEE, 2021, pp. 797–807.
  30. K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical Secure Aggregation for Federated Learning on User-Held Data,” arXiv preprint arXiv:1611.04482, 2016.
  31. M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 1, pp. 269–283, 2021.
  32. M. Crawshaw, M. Liu, F. Orabona, W. Zhang, and Z. Zhuang, “Robustness to Unbounded Smoothness of Generalized SignSGD,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35.   Curran Associates, Inc., 2022, pp. 9955–9968.
  33. J. Zhang, T. He, S. Sra, and A. Jadbabaie, “Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity,” arXiv preprint arXiv:1905.11881, 2019.
  34. V. C. Andrei, X. Li, U. J. Mönich, and H. Boche, “Sensing-Assisted Receivers for Resilient-By-Design 6G MU-MIMO Uplink,” in 2023 IEEE 3rd International Symposium on Joint Communications and Sensing, 2023, pp. 1–6.
  35. G. Scutari, D. P. Palomar, and S. Barbarossa, “The MIMO Iterative Waterfilling Algorithm,” IEEE Transactions on Signal Processing, vol. 57, no. 5, pp. 1917–1935, 2009.
  36. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv preprint arXiv:1810.04805, 2018.
  37. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv preprint arXiv:1907.11692, 2019.
  38. R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts, “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1631–1642.
  39. A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding,” arXiv preprint arXiv:1804.07461, 2018.
  40. A. Williams, N. Nangia, and S. R. Bowman, “A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference,” arXiv preprint arXiv:1704.05426, 2017.
  41. E. F. Sang and F. De Meulder, “Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition,” arXiv preprint cs/0306050, 2003.
  42. L. Derczynski, E. Nichols, M. Van Erp, and N. Limsopatham, “Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition,” in Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017, pp. 140–147.
  43. A. Liu, X. Liu, H. Yu, C. Zhang, Q. Liu, and D. Tao, “Training Robust Deep Neural Networks via Adversarial Noise Propagation,” IEEE Transactions on Image Processing, vol. 30, pp. 5769–5781, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.