Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead (2404.02525v3)
Abstract: The significant advancements in LLMs have resulted in their widespread adoption across various tasks within Software Engineering (SE), including vulnerability detection and repair. Numerous studies have investigated the application of LLMs to enhance vulnerability detection and repair tasks. Despite the increasing research interest, there is currently no existing survey that focuses on the utilization of LLMs for vulnerability detection and repair. In this paper, we aim to bridge this gap by offering a systematic literature review of approaches aimed at improving vulnerability detection and repair through the utilization of LLMs. The review encompasses research work from leading SE, AI, and Security conferences and journals, encompassing 43 papers published across 25 distinct venues, along with 15 high-quality preprint papers, bringing the total to 58 papers. By answering three key research questions, we aim to (1) summarize the LLMs employed in the relevant literature, (2) categorize various LLM adaptation techniques in vulnerability detection, and (3) classify various LLM adaptation techniques in vulnerability repair. Based on our findings, we have identified a series of limitations of existing studies. Additionally, we have outlined a roadmap highlighting potential opportunities that we believe are pertinent and crucial for future research endeavors.
- [n. d.]. ACM Digital Library. https://dl.acm.org.
- [n. d.]. arXiv Database. https://arxiv.org.
- [n. d.]. IEEE Xplore Database. https://ieeexplore.ieee.org.
- [n. d.]. ScienceDirect Database. https://www.sciencedirect.com.
- [n. d.]. SpringerLink Database. https://link.springer.com.
- [n. d.]. Web of Science Database. https://www.webofscience.com.
- [n. d.]. Wiely Database. https://onlinelibrary.wiley.com.
- 2024. Online Appendix for This Review. https://docs.google.com/document/d/18-UrkfH35CNMGRjjsDYZGK6L1aC9wP3GsKCtrIekcUQ/edit?usp=sharing.
- Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333 (2021).
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511 (2023).
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- NatGen: generative pre-training by ”naturalizing” source code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, Abhik Roychoudhury, Cristian Cadar, and Miryung Kim (Eds.). ACM, 18–30. https://doi.org/10.1145/3540250.3549162
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
- DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2023, Hong Kong, China, October 16-18, 2023. ACM, 654–668. https://doi.org/10.1145/3607199.3607242
- Neural Transfer Learning for Repairing Security Vulnerabilities in C Code. IEEE Trans. Software Eng. 49, 1 (2023), 147–165. https://doi.org/10.1109/TSE.2022.3147265
- SeqTrans: Automatic Vulnerability Fix Via Sequence to Sequence Learning. IEEE Trans. Software Eng. 49, 2 (2023), 564–585. https://doi.org/10.1109/TSE.2022.3156637
- Data Quality for Software Vulnerability Datasets. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 121–133. https://doi.org/10.1109/ICSE48619.2023.00022
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
- InCoder: A Generative Model for Code Infilling and Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=hQwb-lbM6EL
- Vision Transformer-Inspired Automated Vulnerability Repair. ACM Transactions on Software Engineering and Methodology (2024).
- Chatgpt for vulnerability detection, classification, and repair: How far are we? APSEC (2023).
- Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2017. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM computing surveys (CSUR) 50, 4 (2017), 1–36.
- GitHub. 2023. Github copilot. https://copilot.github.com.
- UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 7212–7225.
- Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
- Hazim Hanif and Sergio Maffeis. 2022. VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection. In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022. IEEE, 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892280
- Jingxuan He and Martin Vechev. 2023. Large language models for code: Security hardening and adversarial testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 1865–1879.
- Representation Learning for Stack Overflow Posts: How Far are We? arXiv preprint arXiv:2303.06853 (2023).
- Large Language Models for Software Engineering: A Systematic Literature Review. ArXiv abs/2308.10620 (2023). https://api.semanticscholar.org/CorpusID:261048648
- Nafis Tanveer Islam and Peyman Najafirad. 2024. Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models. AAAI Workshop (2024).
- Investigating Data Contamination for Pre-training Language Models. arXiv:2401.06059 [cs.CL]
- Learning and Evaluating Contextual Embedding of Source Code. In International Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:220425306
- Leveraging User-Defined Identifiers for Counterfactual Data Generation in Source Code Vulnerability Detection. In 23rd IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2023, Bogotá, Colombia, October 2-3, 2023, Leon Moonen, Christian D. Newman, and Alessandra Gorla (Eds.). IEEE, 143–150. https://doi.org/10.1109/SCAM59687.2023.00024
- Should i follow this fault localization tool’s output? automated prediction of fault localization effectiveness. Empirical Software Engineering 20 (2015), 1237–1274.
- Comparison and Evaluation on Static Application Security Testing (SAST) Tools for Java. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 921–933.
- StarCoder: may the source be with you! ArXiv abs/2305.06161 (2023). https://api.semanticscholar.org/CorpusID:258588247
- On the Effectiveness of Function-Level Vulnerability Detectors for Inter-Procedural Vulnerabilities. ICSE (2024).
- Software vulnerability detection using deep neural networks: a survey. Proc. IEEE 108, 10 (2020), 1825–1848.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- Software Vulnerability Detection with GPT and In-Context Learning. In 8th International Conference on Data Science in Cyberspace, DSC 2023, Hefei, China, August 18-20, 2023. IEEE, 229–236. https://doi.org/10.1109/DSC59305.2023.00041
- Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks. ICSE (2024).
- David Lo. 2023. Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps. CoRR abs/2309.04142 (2023). https://doi.org/10.48550/ARXIV.2309.04142 arXiv:2309.04142
- Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021).
- Towards Causal Deep Learning for Vulnerability Detection. ICSE (2024).
- Meta. 2023. Code Llama: Open Foundation Models for Code. https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/.
- Microsoft. 2024. Microsoft Copilot for Security. https://microsoft.github.io/PartnerResources/skilling/microsoft-security-academy/microsoft-security-copilot.
- Large Language Models: A Survey. arXiv:2402.06196 [cs.CL]
- Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1611–1622.
- Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper). In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023, René Just and Gordon Fraser (Eds.). ACM, 52–63. https://doi.org/10.1145/3597926.3598037
- CodeGen2: Lessons for Training LLMs on Programming and Natural Languages. CoRR abs/2305.02309 (2023). https://doi.org/10.48550/ARXIV.2305.02309 arXiv:2305.02309
- Codegen2: Lessons for training llms on programming and natural languages. arXiv preprint arXiv:2305.02309 (2023).
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=iaYcJKpY2B_
- OpenAI. 2022. GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Unifying Large Language Models and Knowledge Graphs: A Roadmap. arXiv preprint arXiv:2306.08302 (2023).
- Examining zero-shot vulnerability repair with large language models. arXiv preprint arXiv:2112.02125 (2021).
- PTLVD:Program Slicing and Transformer-based Line-level Vulnerability Detection System. In 23rd IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2023, Bogotá, Colombia, October 2-3, 2023, Leon Moonen, Christian D. Newman, and Alessandra Gorla (Eds.). IEEE, 162–173. https://doi.org/10.1109/SCAM59687.2023.00026
- Software Vulnerability Detection using Large Language Models. In 34th IEEE International Symposium on Software Reliability Engineering, ISSRE 2023 - Workshops, Florence, Italy, October 9-12, 2023. IEEE, 112–119. https://doi.org/10.1109/ISSREW60843.2023.00058
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arXiv:2307.16789 [cs.AI]
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR abs/2009.10297 (2020). arXiv:2009.10297 https://arxiv.org/abs/2009.10297
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- Toward Improved Deep Learning-based Vulnerability Detection. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12.
- Hossain Shahriar and Mohammad Zulkernine. 2012. Mitigating program security vulnerabilities: Approaches and challenges. ACM Computing Surveys (CSUR) 44, 3 (2012), 1–46.
- Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv preprint arXiv:2305.15294 (2023).
- An empirical study of deep learning models for vulnerability detection. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2237–2248.
- Code and named entity recognition in stackoverflow. arXiv preprint arXiv:2005.01634 (2020).
- CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection. J. Syst. Softw. 199 (2023), 111623. https://doi.org/10.1016/J.JSS.2023.111623
- ED TARGETT. 2022. We analysed 90,000+ software vulnerabilities: Here’s what we learned. https://www.thestack.technology/analysis-of-cves-in-2022-software-vulnerabilities-cwes-most-dangerous/.
- Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509 (2022).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Vicarius. 2023. vuln_GPT debuts as AI-powered approach to find and remediate software vulnerabilities. https://venturebeat.com/ai/got-vulns-vuln_gpt-debuts-as-ai-powered-approach-to-find-and-remediate-software-vulnerabilities/.
- Combining Structured Static Code Information and Dynamic Symbolic Traces for Software Vulnerability Prediction. In Proceedings of the 46th International Conference on Software Engineering. ACM.
- Software Testing with Large Language Model: Survey, Landscape, and Vision. arXiv preprint arXiv:2307.07221 (2023).
- A survey on large language model based autonomous agents. Frontiers of Computer Science 18, 6 (2024), 1–26.
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
- VulRep: vulnerability repair based on inducing commits and fixing commits. EURASIP Journal on Wireless Communications and Networking 2023, 1 (2023), 34.
- When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 345–357. https://doi.org/10.1109/ASE56229.2023.00144
- Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (EASE). ACM, 38:1–38:10.
- Code vulnerability detection based on deep sequence and graph models: A survey. Security and Communication Networks 2022 (2022).
- How Effective Are Neural Networks for Fixing Security Vulnerabilities. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023, René Just and Gordon Fraser (Eds.). ACM, 1282–1294. https://doi.org/10.1145/3597926.3598135
- A systematic evaluation of large language models of code. In MAPS@PLDI 2022: 6th ACM SIGPLAN International Symposium on Machine Programming, San Diego, CA, USA, 13 June 2022, Swarat Chaudhuri and Charles Sutton (Eds.). ACM, 1–10. https://doi.org/10.1145/3520312.3534862
- Fabian Yamaguchi. 2023. Joern: A Source code analysis Tool. https://github.com/octopus-platform/joern.
- Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays!. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2287–2298. https://doi.org/10.1109/ICSE48619.2023.00192
- Natural attack for pre-trained models of code. In Proceedings of the 44th International Conference on Software Engineering.
- Identifying relevant studies in software engineering. Inf. Softw. Technol. 53, 6 (2011), 625–637.
- Vulnerability Detection by Learning From Syntax-Based Execution Paths of Code. IEEE Trans. Software Eng. 49, 8 (2023), 4196–4212. https://doi.org/10.1109/TSE.2023.3286586
- A Survey of Learning-based Automated Program Repair. arXiv preprint arXiv:2301.03270 (2023).
- Pre-trained model-based automated software vulnerability repair: How far are we? IEEE Transactions on Dependable and Secure Computing (2023).
- BERTScore: Evaluating Text Generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=SkeHuCVFDr
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
- Out of Sight, Out of Mind: Better Automatic Vulnerability Repair by Broadening Input Ranges and Sources. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). IEEE Computer Society, 872–872.
- The devil is in the tails: How long-tailed code distributions impact large language models. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 40–52.
- CCBERT: Self-Supervised Code Change Representation Learning. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 182–193.
- Large Language Model for Vulnerability Detection: Emerging Results and Future Directions. ICSE NIER track (2024).
- SPVF: security property assisted vulnerability fixing via attention-based models. Empirical Software Engineering 27, 7 (2022), 171.
- Noah Ziems and Shaoen Wu. 2021. Security Vulnerability Detection Using Deep Learning Natural Language Processing. In 2021 IEEE Conference on Computer Communications Workshops, INFOCOM Workshops 2021, Vancouver, BC, Canada, May 10-13, 2021. IEEE, 1–6. https://doi.org/10.1109/INFOCOMWKSHPS51825.2021.9484500
- Xin Zhou (319 papers)
- Sicong Cao (7 papers)
- Xiaobing Sun (19 papers)
- David Lo (229 papers)