Towards Causal Deep Learning for Vulnerability Detection (2310.07958v5)
Abstract: Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.
- [n. d.]. Cybercrime To Cost The World $10.5 Trillion Annually By 2025, howpublished =https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/.
- [n. d.]. Microsoft Exchange Flaw: Attacks Surge After Code Published, howpublished =https://www.bankinfosecurity.com/ms-exchange-flaw-causes-spike-in-trdownloader-gen-trojans-a-16236.
- 2022. NatGen: Generative Pre-training by “Naturalizing” Source Code - Code and scripts for Pre-Training. https://doi.org/10.5281/zenodo.6977595
- Unified Pre-training for Program Understanding and Generation. In 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
- Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations (SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 511–521. https://doi.org/10.1145/3404835.3462840
- MVD: Memory-Related Vulnerability Detection Based on Flow-Sensitive Graph Neural Networks. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh PA) (ICSE ’22). 1456–1468. https://doi.org/10.1145/3510003.3510219
- Deep Learning based Vulnerability Detection: Are We There Yet. IEEE Transactions on Software Engineering (2021), 1–1. https://doi.org/10.1109/TSE.2021.3087402
- Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]
- Counterfactual Explanations for Models of Code. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (Pittsburgh, Pennsylvania) (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA, 125–134. https://doi.org/10.1145/3510457.3513081
- Towards Learning (Dis)-Similarity of Source Code from Program Contrasts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6300–6312.
- TRACED: Execution-aware Pre-training for Source Code. arXiv:2306.07487 [cs.SE]
- A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In Proceedings of the 17th International Conference on Mining Software Repositories (Seoul, Republic of Korea) (MSR ’20). Association for Computing Machinery, New York, NY, USA, 508–512. https://doi.org/10.1145/3379597.3387501
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1536–1547.
- Michael Fu and Chakkrit Tantithamthavorn. 2022. LineVul: A Transformer-based Line-Level Vulnerability Prediction. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR). 608–620. https://doi.org/10.1145/3524842.3528452
- UniXcoder: Unified Cross-Modal Pre-training for Code Representation. arXiv:2203.03850 [cs.CL]
- GraphCodeBERT: Pre-training Code Representations with Data Flow. In International Conference on Learning Representations.
- LineVD: Statement-Level Vulnerability Detection Using Graph Neural Networks. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh PA) (MSR ’22). 596–607. https://doi.org/10.1145/3524842.3527949
- Unicorn: Reasoning about Configurable System Performance through the Lens of Causality. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys ’22). Association for Computing Machinery, New York, NY, USA, 199–217. https://doi.org/10.1145/3492321.3519575
- CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv preprint arXiv:2102.04664 (2021). https://arxiv.org/abs/2102.04664
- Causal Transportability for Visual Recognition. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 7511–7521. https://doi.org/10.1109/CVPR52688.2022.00737
- Judea Pearl. 2000. Causality: Models, reasoning, and inference.
- Judea Pearl and Elias Bareinboim. 2011. Transportability of causal and statistical relations: A formal approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 25. 247–254.
- An Empirical Study of Deep Learning Models for Vulnerability Detection. arXiv:2212.08109 [cs.SE]
- ReCode: Robustness Evaluation of Code Generation Models. arXiv:2212.10264 [cs.LG]
- DeepVD: Toward Class-Separation Features for Neural Network Vulnerability Detection. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 2249–2261. https://doi.org/10.1109/ICSE48619.2023.00189
- Detecting Multi-Sensor Fusion Errors in Advanced Driver-Assistance Systems. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual, South Korea) (ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 493–505. https://doi.org/10.1145/3533767.3534223
- Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. In Advances in Neural Information Processing Systems, Vol. 32. 10197–10207.