How Far Have We Gone in Vulnerability Detection Using Large Language Models (2311.12420v3)

Published 21 Nov 2023 in cs.AI, cs.CL, and cs.CR

Abstract: As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of LLMs in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF (Capture-the-Flag) challenges and real-world applications, with annotations for each vulnerable function detailing the vulnerability type and its root cause. Through our experiments encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models and static analyzers, we find that several LLMs outperform traditional deep learning approaches in vulnerability detection, revealing an untapped potential in LLMs. This work contributes to the understanding and utilization of LLMs for enhanced software security.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (59)

Authors (5)

Zeyu Gao (39 papers)
Hao Wang (1119 papers)
Yuchen Zhou (38 papers)
Wenyu Zhu (9 papers)
Chao Zhang (907 papers)

Citations (13)

View on Semantic Scholar

GitHub

GitHub - Hustcw/VulBench: This is a benchmark for evaluating the vulnerability discovery ability of automated approaches including Large Language Models (LLMs), deep learning methods and static analyzers (51 stars)

How Far Have We Gone in Vulnerability Detection Using Large Language Models (2311.12420v3)

Related Papers

GitHub