LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning (2401.16185v2)

Published 29 Jan 2024 in cs.CR, cs.AI, and cs.SE

Abstract: LLMs have demonstrated significant potential in various tasks, including vulnerability detection. However, current efforts in this area are preliminary, lacking clarity on whether LLMs' vulnerability reasoning capabilities stem from the models themselves or external aids such as knowledge retrieval and tooling support. This paper aims to isolate LLMs' vulnerability reasoning from other capabilities, such as vulnerability knowledge adoption, context information retrieval, and structured output generation. We introduce LLM4Vuln, a unified evaluation framework that separates and assesses LLMs' vulnerability reasoning capabilities and examines improvements when combined with other enhancements. We conducted controlled experiments with 97 ground-truth vulnerabilities and 97 non-vulnerable cases in Solidity and Java, testing them in a total of 9,312 scenarios across four LLMs (GPT-4, GPT-3.5, Mixtral, and Llama 3). Our findings reveal the varying impacts of knowledge enhancement, context supplementation, prompt schemes, and models. Additionally, we identified 14 zero-day vulnerabilities in four pilot bug bounty programs, resulting in \$3,576 in bounties.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (8)

Yuqiang Sun (6 papers)
Daoyuan Wu (39 papers)
Yue Xue (9 papers)
Han Liu (340 papers)
Wei Ma (106 papers)
Lyuye Zhang (12 papers)
Yang Liu (2253 papers)
Yingjiu Li (13 papers)

Citations (29)

View on Semantic Scholar

Tweets

https://twitter.com/clintgibler/status/1812864793541542290

https://twitter.com/dao0x/status/1781002697749205452

https://twitter.com/r3pwnx/status/1752903135595770341

https://twitter.com/JaiV352895/status/1935489408121135331

LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning (2401.16185v2)

Related Papers

Tweets