Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead (2404.02525v3)

Published 3 Apr 2024 in cs.SE

Abstract: The significant advancements in LLMs have resulted in their widespread adoption across various tasks within Software Engineering (SE), including vulnerability detection and repair. Numerous studies have investigated the application of LLMs to enhance vulnerability detection and repair tasks. Despite the increasing research interest, there is currently no existing survey that focuses on the utilization of LLMs for vulnerability detection and repair. In this paper, we aim to bridge this gap by offering a systematic literature review of approaches aimed at improving vulnerability detection and repair through the utilization of LLMs. The review encompasses research work from leading SE, AI, and Security conferences and journals, encompassing 43 papers published across 25 distinct venues, along with 15 high-quality preprint papers, bringing the total to 58 papers. By answering three key research questions, we aim to (1) summarize the LLMs employed in the relevant literature, (2) categorize various LLM adaptation techniques in vulnerability detection, and (3) classify various LLM adaptation techniques in vulnerability repair. Based on our findings, we have identified a series of limitations of existing studies. Additionally, we have outlined a roadmap highlighting potential opportunities that we believe are pertinent and crucial for future research endeavors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (99)
  1. [n. d.]. ACM Digital Library. https://dl.acm.org.
  2. [n. d.]. arXiv Database. https://arxiv.org.
  3. [n. d.]. IEEE Xplore Database. https://ieeexplore.ieee.org.
  4. [n. d.]. ScienceDirect Database. https://www.sciencedirect.com.
  5. [n. d.]. SpringerLink Database. https://link.springer.com.
  6. [n. d.]. Web of Science Database. https://www.webofscience.com.
  7. [n. d.]. Wiely Database. https://onlinelibrary.wiley.com.
  8. 2024. Online Appendix for This Review. https://docs.google.com/document/d/18-UrkfH35CNMGRjjsDYZGK6L1aC9wP3GsKCtrIekcUQ/edit?usp=sharing.
  9. Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333 (2021).
  10. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511 (2023).
  11. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  12. NatGen: generative pre-training by ”naturalizing” source code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 18–30. https://doi.org/10.1145/3540250.3549162
  13. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  14. DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2023, Hong Kong, China, October 16-18, 2023. ACM, 654–668. https://doi.org/10.1145/3607199.3607242
  15. Neural Transfer Learning for Repairing Security Vulnerabilities in C Code. IEEE Trans. Software Eng. 49, 1 (2023), 147–165. https://doi.org/10.1109/TSE.2022.3147265
  16. SeqTrans: Automatic Vulnerability Fix Via Sequence to Sequence Learning. IEEE Trans. Software Eng. 49, 2 (2023), 564–585. https://doi.org/10.1109/TSE.2022.3156637
  17. Data Quality for Software Vulnerability Datasets. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 121–133. https://doi.org/10.1109/ICSE48619.2023.00022
  18. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  19. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
  20. InCoder: A Generative Model for Code Infilling and Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=hQwb-lbM6EL
  21. Vision Transformer-Inspired Automated Vulnerability Repair. ACM Transactions on Software Engineering and Methodology (2024).
  22. Chatgpt for vulnerability detection, classification, and repair: How far are we? APSEC (2023).
  23. Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2017. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM computing surveys (CSUR) 50, 4 (2017), 1–36.
  24. GitHub. 2023. Github copilot. https://copilot.github.com.
  25. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 7212–7225.
  26. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
  27. Hazim Hanif and Sergio Maffeis. 2022. VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection. In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022. IEEE, 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892280
  28. Jingxuan He and Martin Vechev. 2023. Large language models for code: Security hardening and adversarial testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 1865–1879.
  29. Representation Learning for Stack Overflow Posts: How Far are We? arXiv preprint arXiv:2303.06853 (2023).
  30. Large Language Models for Software Engineering: A Systematic Literature Review. ArXiv abs/2308.10620 (2023). https://api.semanticscholar.org/CorpusID:261048648
  31. Nafis Tanveer Islam and Peyman Najafirad. 2024. Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models. AAAI Workshop (2024).
  32. Investigating Data Contamination for Pre-training Language Models. arXiv:2401.06059 [cs.CL]
  33. Learning and Evaluating Contextual Embedding of Source Code. In International Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:220425306
  34. Leveraging User-Defined Identifiers for Counterfactual Data Generation in Source Code Vulnerability Detection. In 23rd IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2023, Bogotá, Colombia, October 2-3, 2023, Leon Moonen, Christian D. Newman, and Alessandra Gorla (Eds.). IEEE, 143–150. https://doi.org/10.1109/SCAM59687.2023.00024
  35. Should i follow this fault localization tool’s output? automated prediction of fault localization effectiveness. Empirical Software Engineering 20 (2015), 1237–1274.
  36. Comparison and Evaluation on Static Application Security Testing (SAST) Tools for Java. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 921–933.
  37. StarCoder: may the source be with you! ArXiv abs/2305.06161 (2023). https://api.semanticscholar.org/CorpusID:258588247
  38. On the Effectiveness of Function-Level Vulnerability Detectors for Inter-Procedural Vulnerabilities. ICSE (2024).
  39. Software vulnerability detection using deep neural networks: a survey. Proc. IEEE 108, 10 (2020), 1825–1848.
  40. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
  41. Software Vulnerability Detection with GPT and In-Context Learning. In 8th International Conference on Data Science in Cyberspace, DSC 2023, Hefei, China, August 18-20, 2023. IEEE, 229–236. https://doi.org/10.1109/DSC59305.2023.00041
  42. Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks. ICSE (2024).
  43. David Lo. 2023. Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps. CoRR abs/2309.04142 (2023). https://doi.org/10.48550/ARXIV.2309.04142 arXiv:2309.04142
  44. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021).
  45. Towards Causal Deep Learning for Vulnerability Detection. ICSE (2024).
  46. Meta. 2023. Code Llama: Open Foundation Models for Code. https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/.
  47. Microsoft. 2024. Microsoft Copilot for Security. https://microsoft.github.io/PartnerResources/skilling/microsoft-security-academy/microsoft-security-copilot.
  48. Large Language Models: A Survey. arXiv:2402.06196 [cs.CL]
  49. Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1611–1622.
  50. Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper). In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023, René Just and Gordon Fraser (Eds.). ACM, 52–63. https://doi.org/10.1145/3597926.3598037
  51. CodeGen2: Lessons for Training LLMs on Programming and Natural Languages. CoRR abs/2305.02309 (2023). https://doi.org/10.48550/ARXIV.2305.02309 arXiv:2305.02309
  52. Codegen2: Lessons for training llms on programming and natural languages. arXiv preprint arXiv:2305.02309 (2023).
  53. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=iaYcJKpY2B_
  54. OpenAI. 2022. GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5.
  55. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  56. Unifying Large Language Models and Knowledge Graphs: A Roadmap. arXiv preprint arXiv:2306.08302 (2023).
  57. Examining zero-shot vulnerability repair with large language models. arXiv preprint arXiv:2112.02125 (2021).
  58. PTLVD:Program Slicing and Transformer-based Line-level Vulnerability Detection System. In 23rd IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2023, Bogotá, Colombia, October 2-3, 2023, Leon Moonen, Christian D. Newman, and Alessandra Gorla (Eds.). IEEE, 162–173. https://doi.org/10.1109/SCAM59687.2023.00026
  59. Software Vulnerability Detection using Large Language Models. In 34th IEEE International Symposium on Software Reliability Engineering, ISSRE 2023 - Workshops, Florence, Italy, October 9-12, 2023. IEEE, 112–119. https://doi.org/10.1109/ISSREW60843.2023.00058
  60. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arXiv:2307.16789 [cs.AI]
  61. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  62. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  63. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR abs/2009.10297 (2020). arXiv:2009.10297 https://arxiv.org/abs/2009.10297
  64. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
  65. Toward Improved Deep Learning-based Vulnerability Detection. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12.
  66. Hossain Shahriar and Mohammad Zulkernine. 2012. Mitigating program security vulnerabilities: Approaches and challenges. ACM Computing Surveys (CSUR) 44, 3 (2012), 1–46.
  67. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv preprint arXiv:2305.15294 (2023).
  68. An empirical study of deep learning models for vulnerability detection. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2237–2248.
  69. Code and named entity recognition in stackoverflow. arXiv preprint arXiv:2005.01634 (2020).
  70. CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection. J. Syst. Softw. 199 (2023), 111623. https://doi.org/10.1016/J.JSS.2023.111623
  71. ED TARGETT. 2022. We analysed 90,000+ software vulnerabilities: Here’s what we learned. https://www.thestack.technology/analysis-of-cves-in-2022-software-vulnerabilities-cwes-most-dangerous/.
  72. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509 (2022).
  73. Attention is all you need. Advances in neural information processing systems 30 (2017).
  74. Vicarius. 2023. vuln_GPT debuts as AI-powered approach to find and remediate software vulnerabilities. https://venturebeat.com/ai/got-vulns-vuln_gpt-debuts-as-ai-powered-approach-to-find-and-remediate-software-vulnerabilities/.
  75. Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability Prediction. In Proceedings of the 46th International Conference on Software Engineering. ACM.
  76. Software Testing with Large Language Model: Survey, Landscape, and Vision. arXiv preprint arXiv:2307.07221 (2023).
  77. A survey on large language model based autonomous agents. Frontiers of Computer Science 18, 6 (2024), 1–26.
  78. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
  79. VulRep: vulnerability repair based on inducing commits and fixing commits. EURASIP Journal on Wireless Communications and Networking 2023, 1 (2023), 34.
  80. When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 345–357. https://doi.org/10.1109/ASE56229.2023.00144
  81. Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE). ACM, 38:1–38:10.
  82. Code vulnerability detection based on deep sequence and graph models: A survey. Security and Communication Networks 2022 (2022).
  83. How Effective Are Neural Networks for Fixing Security Vulnerabilities. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023, René Just and Gordon Fraser (Eds.). ACM, 1282–1294. https://doi.org/10.1145/3597926.3598135
  84. A systematic evaluation of large language models of code. In MAPS@PLDI 2022: 6th ACM SIGPLAN International Symposium on Machine Programming, San Diego, CA, USA, 13 June 2022, Swarat Chaudhuri and Charles Sutton (Eds.). ACM, 1–10. https://doi.org/10.1145/3520312.3534862
  85. Fabian Yamaguchi. 2023. Joern: A Source code analysis Tool. https://github.com/octopus-platform/joern.
  86. Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays!. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2287–2298. https://doi.org/10.1109/ICSE48619.2023.00192
  87. Natural attack for pre-trained models of code. In Proceedings of the 44th International Conference on Software Engineering.
  88. Identifying relevant studies in software engineering. Inf. Softw. Technol. 53, 6 (2011), 625–637.
  89. Vulnerability Detection by Learning From Syntax-Based Execution Paths of Code. IEEE Trans. Software Eng. 49, 8 (2023), 4196–4212. https://doi.org/10.1109/TSE.2023.3286586
  90. A Survey of Learning-based Automated Program Repair. arXiv preprint arXiv:2301.03270 (2023).
  91. Pre-trained model-based automated software vulnerability repair: How far are we? IEEE Transactions on Dependable and Secure Computing (2023).
  92. BERTScore: Evaluating Text Generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=SkeHuCVFDr
  93. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
  94. Out of Sight, Out of Mind: Better Automatic Vulnerability Repair by Broadening Input Ranges and Sources. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). IEEE Computer Society, 872–872.
  95. The devil is in the tails: How long-tailed code distributions impact large language models. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 40–52.
  96. CCBERT: Self-Supervised Code Change Representation Learning. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 182–193.
  97. Large Language Model for Vulnerability Detection: Emerging Results and Future Directions. ICSE NIER track (2024).
  98. SPVF: security property assisted vulnerability fixing via attention-based models. Empirical Software Engineering 27, 7 (2022), 171.
  99. Noah Ziems and Shaoen Wu. 2021. Security Vulnerability Detection Using Deep Learning Natural Language Processing. In 2021 IEEE Conference on Computer Communications Workshops, INFOCOM Workshops 2021, Vancouver, BC, Canada, May 10-13, 2021. IEEE, 1–6. https://doi.org/10.1109/INFOCOMWKSHPS51825.2021.9484500
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xin Zhou (319 papers)
  2. Sicong Cao (7 papers)
  3. Xiaobing Sun (19 papers)
  4. David Lo (229 papers)
Citations (1)