VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector (2001.02350v2)

Published 8 Jan 2020 in cs.CR

Abstract: Automatically detecting software vulnerabilities is an important problem that has attracted much attention from the academic research community. However, existing vulnerability detectors still cannot achieve the vulnerability detection capability and the locating precision that would warrant their adoption for real-world use. In this paper, we present a vulnerability detector that can simultaneously achieve a high detection capability and a high locating precision, dubbed Vulnerability Deep learning-based Locator (VulDeeLocator). In the course of designing VulDeeLocator, we encounter difficulties including how to accommodate semantic relations between the definitions of types as well as macros and their uses across files, how to accommodate accurate control flows and variable define-use relations, and how to achieve high locating precision. We solve these difficulties by using two innovative ideas: (i) leveraging intermediate code to accommodate extra semantic information, and (ii) using the notion of granularity refinement to pin down locations of vulnerabilities. When applied to 200 files randomly selected from three real-world software products, VulDeeLocator detects 18 confirmed vulnerabilities (i.e., true-positives). Among them, 16 vulnerabilities correspond to known vulnerabilities; the other two are not reported in the National Vulnerability Database (NVD) but have been "silently" patched by the vendor of Libav when releasing newer versions.

PDF Abstract

An Analysis of VulDeeLocator: Enhancing Vulnerability Detection with Deep Learning

The paper "VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector" introduces an innovative methodology for detecting software vulnerabilities, a task of critical importance in cybersecurity. Research has consistently demonstrated a need for more effective vulnerability detectors that can adequately identify vulnerabilities while precisely locating them within the code. The innovation of VulDeeLocator lies in addressing these challenges by leveraging a combination of deep learning techniques and intermediate code representation.

VulDeeLocator distinguishes itself through two novel methodologies: the employment of intermediate code to encapsulate additional semantic information and the principle of granularity refinement, which enhances the precision of vulnerability location. Intermediate code, specifically in the Static Single Assignment (SSA) form, improves the detection capabilities by ensuring that each variable is assigned a single use and definition sequence, offering a more detailed representation of code dependencies and control flows. This allows VulDeeLocator to connect semantically-related statements across various files and functions, thereby offering a comprehensive scope that traditional source code-based methods lack.

The granularity refinement process is particularly significant. By integrating BRNN-vdl (Bidirectional Recurrent Neural Network for vulnerability detection and locating), the paper underscores an approach that allows for sophisticated attention mechanisms within the neural network framework. The BRNN-vdl model effectively refines the vulnerability detection process from broader code fragments to individual lines of code, thus substantially increasing locating precision.

Experimental evaluations of VulDeeLocator reveal that it offers superior vulnerability detection capabilities in comparison to existing tools. For instance, when compared with the state-of-the-art SySeVR, VulDeeLocator achieves higher F1-measures and reduced false-positive and false-negative rates, highlighting its improved detection efficacy. This is significant considering that SySeVR, though effective, operates at a relatively coarse granularity, which VulDeeLocator manages to improve upon by a factor of 4.2X in terms of locating precision.

Further, the rigorous evaluation across synthetic, academic, and real-world programs affirms VulDeeLocator's robustness and feasibility in practical applications. With an ability to detect vulnerabilities that remain unreported in conventional databases like NVD, VulDeeLocator shows promise in advancing the current state of automated vulnerability detection systems.

However, the paper also acknowledges existing limitations—chief among them being the need to compile source code into intermediate code, which may not always be feasible, and the challenges of extending VulDeeLocator to languages beyond C. Additionally, because the detection relies on static analysis, it may not identify vulnerabilities that are dependent on dynamic program behaviors.

In conclusion, VulDeeLocator exemplifies a significant progression in the field of software vulnerability detection, combining deep learning with sophisticated program analysis techniques. Its framework not only enhances detection capabilities but also sharply refines locating precision, offering a more practical and reliable solution for cybersecurity practitioners. Future developments might focus on expanding the applicability to other programming languages and enhancing the dynamic analysis capabilities, thereby further bolstering VulDeeLocator's utility in diverse software environments.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Zhen Li (334 papers)
Deqing Zou (12 papers)
Shouhuai Xu (65 papers)
Zhaoxuan Chen (2 papers)
Yawei Zhu (2 papers)
Hai Jin (83 papers)

Citations (165)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos