Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models (2401.07031v2)

Published 13 Jan 2024 in cs.CR, cs.AI, and cs.SE

Abstract: With the recent advancement of LLMs, generating functionally correct code has become less complicated for a wide array of developers. While using LLMs has sped up the functional development process, it poses a heavy risk to code security. Code generation with proper security measures using LLM is a significantly more challenging task than functional code generation. Security measures may include adding a pair of lines of code with the original code, consisting of null pointer checking or prepared statements for SQL injection prevention. Currently, available code repair LLMs generate code repair by supervised fine-tuning, where the model looks at cross-entropy loss. However, the original and repaired codes are mostly similar in functionality and syntactically, except for a few (1-2) lines, which act as security measures. This imbalance between the lines needed for security measures and the functional code enforces the supervised fine-tuned model to prioritize generating functional code without adding proper security measures, which also benefits the model by resulting in minimal loss. Therefore, in this work, for security hardening and strengthening of generated code from LLMs, we propose a reinforcement learning-based method for program-specific repair with the combination of semantic and syntactic reward mechanisms that focus heavily on adding security and functional measures in the code, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Intermediate Help with Using Digital Devices and Online Accounts: Understanding the Needs, Expectations, and Vulnerabilities of Young Adults. In International Conference on Human-Computer Interaction, 3–15. Springer.
  2. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  3. Neural transfer learning for repairing security vulnerabilities in c code. IEEE Transactions on Software Engineering, 49(1): 147–165.
  4. The art of software security assessment: Identifying and preventing software vulnerabilities. Pearson Education.
  5. Incoder: A generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999.
  6. VulRepair: a T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 935–947.
  7. Contextual markov decision processes. arXiv preprint arXiv:1502.02259.
  8. Security Analysis of Docker Containers for ARM Architecture. In 2022 IEEE/ACM 7th Symposium on Edge Computing (SEC), 224–236. IEEE.
  9. Controlling large language models to generate secure and vulnerable code. arXiv preprint arXiv:2302.05319.
  10. Using safety properties to generate vulnerability patches. In 2019 IEEE Symposium on Security and Privacy (SP), 539–554. IEEE.
  11. An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), 144–159. Los Alamitos, CA, USA: IEEE Computer Society.
  12. Large Language Models and Simple, Stupid Bugs. arXiv preprint arXiv:2303.11455.
  13. How Secure is Code Generated by ChatGPT? arXiv preprint arXiv:2304.09655.
  14. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35: 21314–21328.
  15. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161.
  16. Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Transactions on Dependable and Secure Computing.
  17. Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74–81.
  18. Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models. arXiv:2309.06256.
  19. Liu, T. 2019. Optimizing BLEU Scores for Improving Text Generation.
  20. Codegen2: Lessons for training llms on programming and natural languages. arXiv preprint arXiv:2305.02309.
  21. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. ICLR.
  22. Training language models to follow instructions with human feedback. arXiv:2203.02155.
  23. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311–318.
  24. Asleep at the keyboard? assessing the security of github copilot’s code contributions. In 2022 IEEE Symposium on Security and Privacy (SP), 754–768. IEEE.
  25. Examining Zero-Shot Vulnerability Repair with Large Language Models. In 2023 IEEE Symposium on Security and Privacy (SP), 1–18. IEEE Computer Society.
  26. Automatic Program Repair with OpenAI’s Codex: Evaluating QuixBugs. arXiv preprint arXiv:2111.03922.
  27. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 1–16. IEEE.
  28. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297.
  29. Security Implications of Large Language Model Code Assistants: A User Study. arXiv preprint arXiv:2208.09727.
  30. Lost at c: A user study on the security implications of large language model code assistants. In USENIX Security. arXiv preprint arXiv:2208.09727.
  31. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  32. Synopsys. 2023. OPEN SOURCE SECURITY AND RISK ANALYSIS REPORT.
  33. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288.
  34. What language model architecture and pretraining objective works best for zero-shot generalization? In International Conference on Machine Learning, 22964–22984. PMLR.
  35. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 8696–8708. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
  36. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  37. Program vulnerability repair via inductive inference. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 691–702.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
Citations (4)