Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Constrained Decoding for Secure Code Generation (2405.00218v3)

Published 30 Apr 2024 in cs.CR, cs.AI, cs.LG, and cs.SE

Abstract: Code LLMs (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure. Previous research has primarily focused on generating secure code, overlooking the fact that secure code also needs to be correct. This oversight can lead to a false sense of security. Currently, the community lacks a method to measure actual progress in this area, and we need solutions that address both security and correctness of code generation. This paper introduces a new benchmark, CodeGuard+, along with two new metrics, to measure Code LLMs' ability to generate both secure and correct code. Using our new evaluation methods, we show that the state-of-the-art defense technique, prefix tuning, may not be as strong as previously believed, since it generates secure code but sacrifices functional correctness. We also demonstrate that different decoding methods significantly affect the security of Code LLMs. Furthermore, we explore a new defense direction: constrained decoding for secure code generation. We propose new constrained decoding techniques to generate secure code. Our results reveal that constrained decoding is more effective than prefix tuning to improve the security of Code LLMs, without requiring a specialized training dataset. Moreover, our evaluations over eight state-of-the-art Code LLMs show that constrained decoding has strong performance to improve the security of Code LLMs, and our technique outperforms GPT-4.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. [n. d.]. Arbitrary file write during tarfile extraction. https://codeql.github.com/codeql-query-help/python/py-tarslip/.
  2. Amazon. 2023. Amazon CodeWhisperer: Your AI-powered productivity tool for the IDE and command line . https://aws.amazon.com/codewhisperer/.
  3. Guided Open Vocabulary Image Captioning with Constrained Beam Search. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 936–945.
  4. Program Synthesis with Large Language Models. arXiv preprint arXiv:2108.07732 (2021).
  5. Purple llama cyberseceval: A secure coding benchmark for language models. arXiv preprint arXiv:2312.04724 (2023).
  6. Chan Woo Kim. 2022. Guiding Text Generation with Constrained Beam Search in Transformers. https://huggingface.co/blog/constrained-beam-search.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  8. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24, 240 (2023), 1–113.
  9. Eirini Kalliamvakou, GitHub Blog. 2022. Research: quantifying GitHub Copilot’s impact on developer productivity and happiness. https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/.
  10. Ocassionally Secure: A Comparative Analysis of Code Generation Assistants. arXiv preprint arXiv:2402.00689 (2024).
  11. Security Weaknesses of Copilot Generated Code in GitHub. In ACM Transactions on Software Engineering and Methodology. ACM.
  12. GitHub. 2021. Github Copilot: Your AI Pair Programmer. https://github.com/features/copilot/.
  13. CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models. arXiv preprint arXiv:2302.04012 (2023).
  14. Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers. arXiv preprint arXiv:2403.15600 (2024).
  15. Jingxuan He and Martin Vechev. 2023. Large language models for code: Security hardening and adversarial testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 1865–1879.
  16. Instruction Tuning for Secure Code Generation. arXiv preprint arXiv:2402.09497 (2024).
  17. Towards Decoding as Continuous Optimisation in Neural Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 146–156.
  18. The Curious Case of Neural Text Degeneration. In International Conference on Learning Representations.
  19. Enhancing Security of AI-Based Code Synthesis with GitHub Copilot via Cheap and Efficient Prompt-Engineering. arXiv preprint arXiv:2403.12671 (2024).
  20. Nafis Tanveer Islam and Peyman Najafirad. 2024. Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence Workshop.
  21. How secure is code generated by chatgpt?. In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2445–2451.
  22. Controlled Text Generation as Continuous Optimization with Multiple Constraints. In Advances in Neural Information Processing Systems.
  23. Gradient-based Constrained Sampling from Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
  24. Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4582–4597.
  25. Don’t Take It Literally: An Edit-Invariant Sequence Loss for Text Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  26. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. Advances in Neural Information Processing Systems 36 (2024).
  27. StarCoder 2 and The Stack v2: The Next Generation. arXiv preprint arXiv:2402.19173 (2024).
  28. NeuroLogic Decoding:(Un) supervised Neural Text Generation with Predicate Logic Constraints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4288–4299.
  29. Maxim Tabachnyk and Stoyan Nikolov, Google Research. 2022. ML-Enhanced Code Completion Improves Developer Productivity. https://research.google/blog/ml-enhanced-code-completion-improves-developer-productivity/.
  30. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).
  31. Asleep at the keyboard? assessing the security of github copilot’s code contributions. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 754–768.
  32. Examining Zero-Shot Vulnerability Repair with Large Language Models. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2339–2356.
  33. Do users write more insecure code with AI assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 2785–2799.
  34. Matt Post and David Vilar. 2018. Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1314–1324.
  35. COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics. In Advances in Neural Information Processing Systems.
  36. Lost at c: A user study on the security implications of large language model code assistants. In 32nd USENIX Security Symposium (USENIX Security 23). 2205–2222.
  37. Mohammed Latif Siddiq and Joanna CS Santos. 2022. SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security. 29–33.
  38. Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 683–693.
  39. Tiernan Ray, ZDNet. 2023. Microsoft has over a million paying Github Copilot users: CEO Nadella. https://www.zdnet.com/article/microsoft-has-over-a-million-paying-github-copilot-users-ceo-nadella/.
  40. The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification. In Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering. 33–43.
  41. Max Welling and Yee W Teh. 2011. Bayesian Learning via Stochastic Gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11). Citeseer, 681–688.
  42. Deceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions. arXiv preprint arXiv:2312.04730 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yanjun Fu (4 papers)
  2. Ethan Baker (3 papers)
  3. Yizheng Chen (23 papers)
  4. Yu Ding (70 papers)
Citations (2)