Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of Code (2405.11466v1)

Published 19 May 2024 in cs.SE

Abstract: LLMs have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential backdoor signals in code models. Specifically, we examine attention weights and biases, and context embeddings of the clean and poisoned CodeBERT and CodeT5 models. Our results suggest noticeable patterns in context embeddings of poisoned samples for both the poisoned models; however, attention weights and biases do not show any significant differences. This work contributes to ongoing efforts in white-box detection of backdoor signals in LLMs of code through the analysis of parameters and embeddings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Miltiadis Allamanis. 2019. The Adverse Effects of Code Duplication in Machine Learning Models of Code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!) (Athens, Greece). 143–153.
  2. Pavol Bielik and Martin Vechev. 2020. Adversarial robustness for code. In International Conference on Machine Learning. PMLR, 896–907.
  3. Shuwen Chai and Jinghui Chen. 2022. One-shot neural backdoor erasing via adversarial weight masking. Advances in Neural Information Processing Systems 35 (2022), 22285–22299.
  4. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018).
  5. Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021).
  6. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  7. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP. Association for Computational Linguistics, Online, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
  8. Trojan signatures in DNN weights. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12–20.
  9. InCoder: A Generative Model for Code Infilling and Synthesis. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=hQwb-lbM6EL
  10. Can adversarial weight perturbations inject neural backdoors. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2029–2032.
  11. Occlusion-based Detection of Trojan-triggering Inputs in Large Language Models of Code. arXiv preprint arXiv:2312.04004 (2023).
  12. A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques. arXiv preprint arXiv:2305.03803 (2023).
  13. TrojanedCM: A Repository for Poisoned Neural Models of Source Code. arXiv preprint arXiv:2311.14850 (2023).
  14. On Trojan Signatures in Large Language Models of Code. arXiv preprint arXiv:2402.16896 (2024).
  15. Measuring Impacts of Poisoning on Model Parameters and Neuron Activations: A Case Study of Poisoning CodeBERT. arXiv preprint arXiv:2402.12936 (2024).
  16. Trojan Model Detection Using Activation Optimization. arXiv preprint arXiv:2306.04877 (2023).
  17. Akshita Jha and Chandan K Reddy. 2023. Codeattack: Code-based adversarial attacks for pre-trained programming language models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 14892–14900.
  18. Learning and evaluating contextual embedding of source code. In International conference on machine learning. PMLR, 5110–5121.
  19. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  20. Poison Attack and Poison Detection on Deep Source Code Processing Models. ACM Transactions on Software Engineering and Methodology (2023). https://doi.org/10.1145/3630008
  21. Multi-target Backdoor Attacks for Code Pre-trained Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 7236–7254. https://doi.org/10.18653/v1/2023.acl-long.399
  22. RoBERTa: A Robustly Optimized BERT Pretraining Approach. International Conference on Learning Representations (ICLR), OpenReview.net (2020).
  23. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
  24. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=iaYcJKpY2B_
  25. Emanuel Parzen. 1962. On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics 33, 3 (1962), pp. 1065–1076.
  26. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
  27. ONION: A Simple and Effective Defense Against Textual Backdoor Attacks. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 9558–9566. https://doi.org/10.18653/v1/2021.emnlp-main.752
  28. On the Generalizability of Neural Program Models with Respect to Semantic-Preserving Program Transformations. Information and Software Technology (IST) 135(106552) (2021), 1–13.
  29. Understanding Neural Code Intelligence through Program Simplification. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (Athens, Greece). 441–452.
  30. Memorization and Generalization in Neural Code Intelligence Models. Information and Software Technology (IST) 153(107066) (2023), 1–20.
  31. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  32. Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning. arXiv preprint arXiv:2203.15506 (2022).
  33. G. Ramakrishnan and A. Albarghouthi. 2022. Backdoors in Neural Models of Source Code. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE Computer Society, Los Alamitos, CA, USA, 2892–2899. https://doi.org/10.1109/ICPR56361.2022.9956690
  34. Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
  35. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 1559–1575.
  36. CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, 652–660. https://doi.org/10.1145/3485447.3512225
  37. Spectral signatures in backdoor attacks. Advances in neural information processing systems (NeurIPS) 31 (2018).
  38. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).
  39. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Part of Advances in Neural Information Processing Systems, Volume 30 (Long Beach, CA, USA) (NIPS 2017). Curran Associates Inc., Red Hook, NY, USA, 5998–6008.
  40. You See What I Want You to See: Poisoning Vulnerabilities in Neural Code Search. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 1233–1245. https://doi.org/10.1145/3540250.3549153
  41. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8696–8708. https://doi.org/10.18653/v1/2021.emnlp-main.685
  42. Detecting AI trojans using meta neural analysis. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 103–120.
  43. What do code models memorize? an empirical study on large language models of code. arXiv preprint arXiv:2308.09932 (2023).
  44. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. Curran Associates Inc., Red Hook, NY, USA.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com