LLM-Assisted Static Analysis for Detecting Security Vulnerabilities (2405.17238v2)
Abstract: Software is prone to security vulnerabilities. Program analysis tools to detect them have limited effectiveness in practice due to their reliance on human labeled specifications. LLMs (or LLMs) have shown impressive code generation capabilities but they cannot do complex reasoning over code to detect such vulnerabilities especially since this task requires whole-repository analysis. We propose IRIS, a neuro-symbolic approach that systematically combines LLMs with static analysis to perform whole-repository reasoning for security vulnerability detection. Specifically, IRIS leverages LLMs to infer taint specifications and perform contextual analysis, alleviating needs for human specifications and inspection. For evaluation, we curate a new dataset, CWE-Bench-Java, comprising 120 manually validated security vulnerabilities in real-world Java projects. A state-of-the-art static analysis tool CodeQL detects only 27 of these vulnerabilities whereas IRIS with GPT-4 detects 55 (+28) and improves upon CodeQL's average false discovery rate by 5% points. Furthermore, IRIS identifies 6 previously unknown vulnerabilities which cannot be found by existing tools.
- Ql: Object-oriented queries on relational data. In European Conference on Object-Oriented Programming, 2016. URL https://api.semanticscholar.org/CorpusID:13385963.
- Improving java deserialization gadget chain mining via overriding-guided object generation. In Proceedings of the 45th International Conference on Software Engineering (ICSE), 2023. doi: 10.1109/ICSE48619.2023.00044.
- Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering, 48:3280–3296, 2020. URL https://api.semanticscholar.org/CorpusID:221703797.
- Checker Framework, 2024. https://checkerframework.org/.
- Path-sensitive code embedding via contrastive learning for software vulnerability detection. Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022. URL https://api.semanticscholar.org/CorpusID:250562410.
- Scalable taint specification inference with big code. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 760–774, 2019.
- Code Checker, 2023. https://github.com/Ericsson/codechecker.
- CPPCheck, 2023. https://cppcheck.sourceforge.io/.
- CVE Trends, 2024. https://www.cvedetails.com.
- Vulnerability detection with code language models: How far are we? arXiv preprint arXiv:2403.18624, 2024.
- Fb Infer, 2023. https://fbinfer.com/.
- FlawFinder, 2023. URL https://dwheeler.com/flawfinder.
- M. Fu and C. Tantithamthavorn. Linevul: A transformer-based line-level vulnerability prediction. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR). IEEE, 2022.
- GitHub. Codeql, 2024a. https://codeql.github.com.
- GitHub. Github advisory database, 2024b. https://github.com/advisories.
- GitHub. Github security advisories, 2024c. https://github.com/github/advisory-database.
- J. He and M. Vechev. Large language models for code: Security hardening and adversarial testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1865–1879, 2023.
- S. Heckman and L. Williams. A model building process for identifying actionable static analysis alerts. In 2009 International conference on software testing verification and validation, pages 161–170. IEEE, 2009.
- Linevd: Statement-level vulnerability detection using graph neural networks. 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), pages 596–607, 2022. URL https://api.semanticscholar.org/CorpusID:247362653.
- Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770, 2023.
- Why don’t software developers use static analysis tools to find bugs? In 2013 35th International Conference on Software Engineering (ICSE), pages 672–681. IEEE, 2013.
- Repair is nearly generation: Multilingual program repair with llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5131–5140, 2023.
- Taming false alarms from a domain-unaware c analyzer by a bayesian statistical post analysis. In International Static Analysis Symposium, pages 203–217. Springer, 2005.
- Detecting false alarms from automatic static analysis tools: How far are we? In Proceedings of the 44th International Conference on Software Engineering, pages 698–709, 2022.
- Understanding the effectiveness of large language models in detecting security vulnerabilities. arXiv preprint arXiv:2311.16169, 2023.
- Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In International conference on software engineering (ICSE), 2023.
- Enhancing static analysis for practical bug detection: An llm-integrated approach. Proceedings of the ACM on Programming Languages (PACMPL), Issue OOPSLA, 2024.
- Comparison and evaluation on static application security testing (sast) tools for java. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 921–933, 2023.
- Vulnerability detection with fine-grained interpretations. Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021. URL https://api.semanticscholar.org/CorpusID:235490574.
- Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing, 19:2244–2258, 2018. URL https://api.semanticscholar.org/CorpusID:49869471.
- Vuldeelocator: A deep learning-based fine-grained vulnerability detector. IEEE Transactions on Dependable and Secure Computing, 19:2821–2837, 2020. URL https://api.semanticscholar.org/CorpusID:210064554.
- An empirical study on the effectiveness of static c code analyzers for vulnerability detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 544–555, 2022.
- Merlin: Specification inference for explicit information flow problems. ACM Sigplan Notices, 44(6):75–86, 2009.
- Lost in translation: A study of bugs introduced by large language models while translating code. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), pages 866–866. IEEE Computer Society, 2024.
- I. A. A. Ranking. Finding patterns in static analysis alerts. In Proceedings of the 11th working conference on mining software repositories. Citeseer, 2014.
- Semgrep. The semgrep platform. https://semgrep.dev/, 2023.
- Y. Smaragdakis and M. Bravenboer. Using datalog for fast and easy program analysis. In International Datalog 2.0 Workshop, pages 245–251. Springer, 2010.
- Snyk.io, 2024. https://snyk.io.
- SonarQube, 2024. https://www.sonarsource.com/products/sonarqube.
- Dataflow analysis-inspired deep learning for efficient vulnerability detection, 2023.
- A comprehensive study of the capabilities of large language models for vulnerability detection. arXiv preprint arXiv:2403.17218, 2024.
- SWE Agent, 2024. https://swe-agent.com.
- C. S. Xia and L. Zhang. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 959–971, 2022.
- Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery, 2023.
- Fuzz4all: Universal fuzzing with large language models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13, 2024.
- Large language models for test-free fault localization. arXiv preprint arXiv:2310.01726, 2023.
- Autocoderover: Autonomous program improvement. arXiv preprint arXiv:2404.05427, 2024.
- Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Neural Information Processing Systems, 2019. URL https://api.semanticscholar.org/CorpusID:202539112.