Deep Learning based Vulnerability Detection: Are We There Yet? (2009.07235v1)

Published 3 Sep 2020 in cs.SE

Abstract: Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy of up to 95% at detecting vulnerabilities. In this paper, we ask, "how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?". To our surprise, we find that their performance drops by more than 50%. A systematic investigation of what causes such precipitous performance drop reveals that existing DL-based vulnerability prediction approaches suffer from challenges with the training data (e.g., data duplication, unrealistic distribution of vulnerable classes, etc.) and with the model choices (e.g., simple token-based models). As a result, these approaches often do not learn features related to the actual cause of the vulnerabilities. Instead, they learn unrelated artifacts from the dataset (e.g., specific variable/function names, etc.). Leveraging these empirical findings, we demonstrate how a more principled approach to data collection and model design, based on realistic settings of vulnerability prediction, can lead to better solutions. The resulting tools perform significantly better than the studied baseline: up to 33.57% boost in precision and 128.38% boost in recall compared to the best performing model in the literature. Overall, this paper elucidates existing DL-based vulnerability prediction systems' potential issues and draws a roadmap for future DL-based vulnerability prediction research. In that spirit, we make available all the artifacts supporting our results: https://git.io/Jf6IA.

PDF Abstract

Analysis of Deep Learning Techniques for Vulnerability Detection in Real-World Settings

The paper "Deep Learning based Vulnerability Detection: Are We There Yet?" by Saikat Chakraborty et al., presents a detailed examination of deep learning (DL) models applied to the field of software vulnerability detection in real-world environments. Current DL approaches have demonstrated high performance metrics within controlled experimental settings; however, this paper aims to evaluate such models in more practical, real-world scenarios to understand their actual efficacy and limits.

Core Findings and Contributions

The paper identifies several critical issues with current DL vulnerability prediction models:

Data Duplication and Imbalance: Existing datasets like SATE IV Juliet and SARD are often simplistic and suffer from significant duplication, with up to 68% duplicates reported. This undermines the models' ability to generalize and falsely inflates performance metrics. Furthermore, real-world data are often imbalanced with a larger prevalence of non-vulnerable samples, which are not properly addressed by existing methods.
Irrelevant Feature Learning: An analysis of feature importances revealed that token-based models typically learn from artifact features unrelated to vulnerabilities, attributable to dataset biases and simplistic models that overlook semantic dependencies in the code.
Model Construction Limitations: The existing DL models—predominantly token-based—generally do not accommodate semantic and syntactic information effectively. Graph-based techniques like those suggested by Devign utilize syntactic and semantic graphs to an extent, but still struggle with proper class separation due to lackluster representation learning.

In response to these findings, the authors introduce the NABLA framework for vulnerability detection, which incorporates:

Graph Neural Networks (GGNN): By leveraging Code Property Graphs, NABLA encapsulates both syntactic and semantic dependences, allowing models to more accurately capture the intrinsic properties of software vulnerabilities.
Data Rebalancing with SMOTE: Applying Synthetic Minority Over-sampling Technique (SMOTE) reconciles class imbalances, fostering a more robust model training process.
Enhanced Representation Learning: Utilization of triplet loss in multi-layer perceptrons significantly improves class separability, translating into better performance metrics for vulnerability prediction.

Evaluation and Results

The authors evaluate NABLA against several state-of-the-art models using datasets from two real-world projects: Chromium and Debian. Their assessment reveals that NABLA achieves up to 33.57% improvement in precision and a 128.38% boost in recall versus contemporary methods, delivering a substantial advancement in detecting real-world vulnerabilities. These metrics underscore its efficacy compared to alternative DL approaches that typically experience a dramatic performance drop outside synthetic conditions.

Implications and Future Work

From a practical standpoint, the findings highlight the need for refined datasets that better reflect real-world scenarios in both complexity and distribution of vulnerabilities. This paper emphasizes the importance of model designs that embrace real-world constraints and introduces elements such as semantic reasoning and data balance corrections into DL workflows.

In future developments, comprehensive models like NABLA that effectively integrate sophisticated feature extraction, data balancing, and advanced separability in latent spaces can be expected to push the boundaries of vulnerability detection further, making substantial contributions to software security assurance.

Concluding Remarks

The investigation by Chakraborty and colleagues provides a critical viewpoint on the capabilities and deficiencies of current DL-based vulnerability detection systems under realistic conditions. By proposing advanced methodologies and frameworks, this work paves a pathway for future research directed towards practical, reliable detection mechanisms in software engineering contexts.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Saikat Chakraborty (62 papers)
Rahul Krishna (28 papers)
Yangruibo Ding (17 papers)
Baishakhi Ray (88 papers)

Citations (382)

View on Semantic Scholar