SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection
Abstract: Software vulnerabilities (SVs) have emerged as a prevalent and critical concern for safety-critical security systems. This has spurred significant advancements in utilizing AI-based methods, including machine learning and deep learning, for software vulnerability detection (SVD). While AI-based methods have shown promising performance in SVD, their effectiveness on real-world, complex, and diverse source code datasets remains limited in practice. To tackle this challenge, in this paper, we propose a novel framework that enhances the capability of LLMs to learn and utilize semantic and syntactic relationships from source code data for SVD. As a result, our approach can enable the acquisition of fundamental knowledge from source code data while adeptly utilizing crucial relationships, i.e., semantic and syntactic associations, to effectively address the software vulnerability detection (SVD) problem. The rigorous and extensive experimental results on three real-world challenging datasets (i.e., ReVeal, D2A, and Devign) demonstrate the superiority of our approach over the effective and state-of-the-art baselines. In summary, on average, our SAFE approach achieves higher performances from 4.79% to 9.15% for F1-measure and from 16.93% to 21.70% for Recall compared to the baselines across all datasets used.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.