Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 74 tok/s
Gemini 2.5 Pro 55 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 109 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 464 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification (2410.20136v1)

Published 26 Oct 2024 in cs.CR and cs.LG

Abstract: Neural code models have found widespread success in tasks pertaining to code intelligence, yet they are vulnerable to backdoor attacks, where an adversary can manipulate the victim model's behavior by inserting triggers into the source code. Recent studies indicate that advanced backdoor attacks can achieve nearly 100% attack success rates on many software engineering tasks. However, effective defense techniques against such attacks remain insufficiently explored. In this study, we propose CodePurify, a novel defense against backdoor attacks on code models through entropy-based purification. Entropy-based purification involves the process of precisely detecting and eliminating the possible triggers in the source code while preserving its semantic information. Within this process, CodePurify first develops a confidence-driven entropy-based measurement to determine whether a code snippet is poisoned and, if so, locates the triggers. Subsequently, it purifies the code by substituting the triggers with benign tokens using a masked LLM. We extensively evaluate CodePurify against four advanced backdoor attacks across three representative tasks and two popular code models. The results show that CodePurify significantly outperforms four commonly used defense baselines, improving average defense performance by at least 40%, 40%, and 12% across the three tasks, respectively. These findings highlight the potential of CodePurify to serve as a robust defense against backdoor attacks on neural code models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” Advances in neural information processing systems, vol. 32, 2019.
  2. M. I. Azeem, F. Palomba, L. Shi, and Q. Wang, “Machine learning techniques for code smell detection: A systematic literature review and meta-analysis,” Information and Software Technology, vol. 108, pp. 115–138, 2019.
  3. M. Jin, S. Shahriar, M. Tufano, X. Shi, S. Lu, N. Sundaresan, and A. Svyatkovskiy, “Inferfix: End-to-end program repair with llms,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 1646–1656.
  4. E. Mashhadi and H. Hemmati, “Applying codebert for automated program repair of java simple bugs,” in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR).   IEEE, 2021, pp. 505–509.
  5. J. Shin, M. Wei, J. Wang, L. Shi, and S. Wang, “The good, the bad, and the missing: Neural code generation for machine learning tasks,” ACM Transactions on Software Engineering and Methodology, vol. 33, no. 2, pp. 1–24, 2023.
  6. T. H. Le, H. Chen, and M. A. Babar, “Deep learning for source code modeling and generation: Models, applications, and challenges,” ACM Computing Surveys (CSUR), vol. 53, no. 3, pp. 1–38, 2020.
  7. A. E. Hassan, G. A. Oliva, D. Lin, B. Chen, Z. Ming et al., “Rethinking software engineering in the foundation model era: From task-driven ai copilots to goal-driven ai pair programmers,” arXiv preprint arXiv:2404.10225, 2024.
  8. Z. Yang, Z. Sun, T. Z. Yue, P. Devanbu, and D. Lo, “Robustness, security, privacy, explainability, efficiency, and usability of large language models for code,” arXiv preprint arXiv:2403.07506, 2024.
  9. G. Ramakrishnan and A. Albarghouthi, “Backdoors in neural models of source code,” in 26th International Conference on Pattern Recognition, ICPR 2022, Montreal, QC, Canada, August 21-25, 2022.   IEEE, 2022, pp. 2892–2899. [Online]. Available: https://doi.org/10.1109/ICPR56361.2022.9956690
  10. Y. Wan, S. Zhang, H. Zhang, Y. Sui, G. Xu, D. Yao, H. Jin, and L. Sun, “You see what i want you to see: poisoning vulnerabilities in neural code search,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1233–1245.
  11. Z. Yang, B. Xu, J. M. Zhang, H. J. Kang, J. Shi, J. He, and D. Lo, “Stealthy backdoor attack for code models,” IEEE Transactions on Software Engineering, 2024.
  12. J. Li, Z. Li, H. Zhang, G. Li, Z. Jin, X. Hu, and X. Xia, “Poison attack and poison detection on deep source code processing models,” ACM Transactions on Software Engineering and Methodology, 2023.
  13. A. Hussain, M. R. I. Rabin, T. Ahmed, M. A. Alipour, and B. Xu, “Occlusion-based detection of trojan-triggering inputs in large language models of code,” CoRR, vol. abs/2312.04004, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2312.04004
  14. B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019.   IEEE, 2019, pp. 707–723. [Online]. Available: https://doi.org/10.1109/SP.2019.00031
  15. T. Sun, L. Pang, C. Chen, and H. Ling, “Mask and restore: Blind backdoor defense at test time with masked autoencoder,” CoRR, vol. abs/2303.15564, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.15564
  16. B. G. Doan, E. Abbasnejad, and D. C. Ranasinghe, “Februus: Input purification defense against trojan attacks on deep neural network systems,” in ACSAC ’20: Annual Computer Security Applications Conference, Virtual Event / Austin, TX, USA, 7-11 December, 2020.   ACM, 2020, pp. 897–912. [Online]. Available: https://doi.org/10.1145/3427228.3427264
  17. J. I. Maletic and A. Marcus, “Supporting program comprehension using semantic and structural information,” in Proceedings of the 23rd International Conference on Software Engineering, ICSE 2001, 12-19 May 2001, Toronto, Ontario, Canada, H. A. Müller, M. J. Harrold, and W. Schäfer, Eds.   IEEE Computer Society, 2001, pp. 103–112. [Online]. Available: https://doi.org/10.1109/ICSE.2001.919085
  18. F. Qi, Y. Chen, M. Li, Y. Yao, Z. Liu, and M. Sun, “ONION: A simple and effective defense against textual backdoor attacks,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021.   Association for Computational Linguistics, 2021, pp. 9558–9566. [Online]. Available: https://doi.org/10.18653/v1/2021.emnlp-main.752
  19. Y. Liu, W. Lee, G. Tao, S. Ma, Y. Aafer, and X. Zhang, “ABS: scanning neural networks for back-doors by artificial brain stimulation,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, L. Cavallaro, J. Kinder, X. Wang, and J. Katz, Eds.   ACM, 2019, pp. 1265–1282. [Online]. Available: https://doi.org/10.1145/3319535.3363216
  20. A. Rényi, “On measures of entropy and information,” in Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1: contributions to the theory of statistics, vol. 4.   University of California Press, 1961, pp. 547–562.
  21. “tree-sitter,” https://tree-sitter.github.io/tree-sitter/, 2024.
  22. R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers.   The Association for Computer Linguistics, 2016. [Online]. Available: https://doi.org/10.18653/v1/p16-1162
  23. C. Na, Y. Choi, and J. Lee, “DIP: dead code insertion based black-box attack for programming language model,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds.   Association for Computational Linguistics, 2023, pp. 7777–7791. [Online]. Available: https://doi.org/10.18653/v1/2023.acl-long.430
  24. C. M. Bishop and N. M. Nasrabadi, “Pattern Recognition and Machine Learning,” J. Electronic Imaging, vol. 16, no. 4, p. 049901, 2007. [Online]. Available: https://doi.org/10.1117/1.2819119
  25. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., “Codebert: A pre-trained model for programming and natural languages,” arXiv preprint arXiv:2002.08155, 2020.
  26. D. Fried, A. Aghajanyan, J. Lin, S. Wang, E. Wallace, F. Shi, R. Zhong, S. Yih, L. Zettlemoyer, and M. Lewis, “Incoder: A generative model for code infilling and synthesis,” in The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023.   OpenReview.net, 2023. [Online]. Available: https://openreview.net/pdf?id=hQwb-lbM6EL
  27. OpenAI, “ChatGPT,” https://openai.com/blog/chatgpt/, 2022.
  28. Z. Yang, J. Shi, J. He, and D. Lo, “Natural attack for pre-trained models of code,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1482–1493.
  29. C. S. Xia and L. Zhang, “Less training, more repairing please: revisiting automated program repair via zero-shot learning,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, A. Roychoudhury, C. Cadar, and M. Kim, Eds.   ACM, 2022, pp. 959–971. [Online]. Available: https://doi.org/10.1145/3540250.3549101
  30. J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers).   Association for Computational Linguistics, 2019, pp. 4171–4186. [Online]. Available: https://doi.org/10.18653/v1/n19-1423
  31. J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, and M. M. Mia, “Towards a big data curated benchmark of inter-project code clones,” in 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, September 29 - October 3, 2014.   IEEE Computer Society, 2014, pp. 476–480. [Online]. Available: https://doi.org/10.1109/ICSME.2014.77
  32. S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. B. Clement, D. Drain, D. Jiang, D. Tang, G. Li, L. Zhou, L. Shou, L. Zhou, M. Tufano, M. Gong, M. Zhou, N. Duan, N. Sundaresan, S. K. Deng, S. Fu, and S. Liu, “Codexglue: A machine learning benchmark dataset for code understanding and generation,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, J. Vanschoren and S. Yeung, Eds., 2021. [Online]. Available: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c16a5320fa475530d9583c34fd356ef5-Abstract-round1.html
  33. M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshyvanyk, “An empirical study on learning bug-fixing patches in the wild via neural machine translation,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 28, no. 4, pp. 1–29, 2019.
  34. Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” arXiv preprint arXiv:2109.00859, 2021.
  35. B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 8011–8021. [Online]. Available: https://proceedings.neurips.cc/paper/2018/hash/280cf18baf4311c92aa5a042336587d3-Abstract.html
  36. B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” arXiv preprint arXiv:1811.03728, 2018.
  37. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
  38. Y. Li, S. Liu, K. Chen, X. Xie, T. Zhang, and Y. Liu, “Multi-target backdoor attacks for code pre-trained models,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds.   Association for Computational Linguistics, 2023, pp. 7236–7254. [Online]. Available: https://doi.org/10.18653/v1/2023.acl-long.399
  39. Y. Liu, S. Ma, Y. Aafer, W. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” in 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018.   The Internet Society, 2018. [Online]. Available: https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-5_Liu_paper.pdf
  40. T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnerabilities in the machine learning model supply chain,” CoRR, vol. abs/1708.06733, 2017. [Online]. Available: http://arxiv.org/abs/1708.06733
  41. X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,” CoRR, vol. abs/1712.05526, 2017. [Online]. Available: http://arxiv.org/abs/1712.05526
  42. H. Zhong, C. Liao, A. C. Squicciarini, S. Zhu, and D. J. Miller, “Backdoor embedding in convolutional neural network models via invisible perturbation,” in CODASPY ’20: Tenth ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA, March 16-18, 2020, V. Roussev, B. Thuraisingham, B. Carminati, and M. Kantarcioglu, Eds.   ACM, 2020, pp. 97–108. [Online]. Available: https://doi.org/10.1145/3374664.3375751
  43. A. Saha, A. Subramanya, and H. Pirsiavash, “Hidden trigger backdoor attacks,” in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020.   AAAI Press, 2020, pp. 11 957–11 965. [Online]. Available: https://doi.org/10.1609/aaai.v34i07.6871
  44. K. D. Doan, Y. Lao, and P. Li, “Backdoor attack with imperceptible input and latent modification,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 18 944–18 957. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/9d99197e2ebf03fc388d09f1e94af89b-Abstract.html
  45. K. Kurita, P. Michel, and G. Neubig, “Weight poisoning attacks on pretrained models,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds.   Association for Computational Linguistics, 2020, pp. 2793–2806. [Online]. Available: https://doi.org/10.18653/v1/2020.acl-main.249
  46. S. Li, H. Liu, T. Dong, B. Z. H. Zhao, M. Xue, H. Zhu, and J. Lu, “Hidden backdoors in human-centric language models,” in CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021, Y. Kim, J. Kim, G. Vigna, and E. Shi, Eds.   ACM, 2021, pp. 3123–3140. [Online]. Available: https://doi.org/10.1145/3460120.3484576
  47. F. Qi, M. Li, Y. Chen, Z. Zhang, Z. Liu, Y. Wang, and M. Sun, “Hidden killer: Invisible textual backdoor attacks with syntactic trigger,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, C. Zong, F. Xia, W. Li, and R. Navigli, Eds.   Association for Computational Linguistics, 2021, pp. 443–453. [Online]. Available: https://doi.org/10.18653/v1/2021.acl-long.37
  48. F. Qi, Y. Chen, X. Zhang, M. Li, Z. Liu, and M. Sun, “Mind the style of text! adversarial and backdoor attacks based on text style transfer,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, M. Moens, X. Huang, L. Specia, and S. W. Yih, Eds.   Association for Computational Linguistics, 2021, pp. 4569–4580. [Online]. Available: https://doi.org/10.18653/v1/2021.emnlp-main.374
  49. L. Gan, J. Li, T. Zhang, X. Li, Y. Meng, F. Wu, Y. Yang, S. Guo, and C. Fan, “Triggerless backdoor attack for NLP tasks with clean labels,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, M. Carpuat, M. de Marneffe, and I. V. M. Ruíz, Eds.   Association for Computational Linguistics, 2022, pp. 2942–2952. [Online]. Available: https://doi.org/10.18653/v1/2022.naacl-main.214
  50. W. Sun, Y. Chen, G. Tao, C. Fang, X. Zhang, Q. Zhang, and B. Luo, “Backdooring neural code search,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023.   Association for Computational Linguistics, 2023, pp. 9692–9708. [Online]. Available: https://doi.org/10.18653/v1/2023.acl-long.540

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube