Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation (2109.03341v1)
Abstract: Identifying vulnerable code is a precautionary measure to counter software security breaches. Tedious expert effort has been spent to build static analyzers, yet insecure patterns are barely fully enumerated. This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program, in order to improve prediction performance. Compared with a generic GNN, our enhancements include a synthesis of multiple representations learned from the several parsed graphs of a program, and a new training loss metric that leverages the fine granularity of labeling. Our model outperforms multiple text, image and graph-based approaches, across two real-world datasets.
- Yufan Zhuang (16 papers)
- Sahil Suneja (9 papers)
- Veronika Thost (21 papers)
- Giacomo Domeniconi (7 papers)
- Alessandro Morari (10 papers)
- Jim Laredo (8 papers)