Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection (2407.16235v1)

Published 23 Jul 2024 in cs.SE and cs.AI

Abstract: Software vulnerabilities pose significant security challenges and potential risks to society, necessitating extensive efforts in automated vulnerability detection. There are two popular lines of work to address automated vulnerability detection. On one hand, Static Application Security Testing (SAST) is usually utilized to scan source code for security vulnerabilities, especially in industries. On the other hand, deep learning (DL)-based methods, especially since the introduction of LLMs, have demonstrated their potential in software vulnerability detection. However, there is no comparative study between SAST tools and LLMs, aiming to determine their effectiveness in vulnerability detection, understand the pros and cons of both SAST and LLMs, and explore the potential combination of these two families of approaches. In this paper, we compared 15 diverse SAST tools with 12 popular or state-of-the-art open-source LLMs in detecting software vulnerabilities from repositories of three popular programming languages: Java, C, and Python. The experimental results showed that SAST tools obtain low vulnerability detection rates with relatively low false positives, while LLMs can detect up 90\% to 100\% of vulnerabilities but suffer from high false positives. By further ensembling the SAST tools and LLMs, the drawbacks of both SAST tools and LLMs can be mitigated to some extent. Our analysis sheds light on both the current progress and future directions for software vulnerability detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. 2023. Tree-sitter. https://tree-sitter.github.io/tree-sitter/
  2. Automatic semantic augmentation of language model prompts (for code summarization). In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13.
  3. Mistral AI. 2023. Mistral. https://huggingface.co/mistralai/Mistral-7B-v0.1
  4. An empirical study of security warnings from static application security testing tools. Journal of Systems and Software 158 (2019), 110427.
  5. Detecting security vulnerabilities with static analysis–A case study. Pollack Periodica (2021).
  6. Qwen Technical Report. arXiv preprint arXiv:2309.16609 (2023).
  7. Bandit. null. Bandit: a tool designed to find common security issues in Python code. https://github.com/PyCQA/bandit
  8. Sindre Beba and Magnus Melseth Karlsen. 2019. Implementation analysis of open-source Static analysis tools for detecting security vulnerabilities. Master’s thesis. NTNU.
  9. CVEfixes: automated collection of vulnerabilities and their fixes from open-source software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering. 30–39.
  10. Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node. js Packages. IEEE Transactions on Reliability (2023).
  11. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  12. Deep learning based vulnerability detection: Are we there yet? IEEE Transactions on Software Engineering 48, 9 (2021), 3280–3296.
  13. An empirical assessment of security risks of global android banking apps. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1310–1322.
  14. Diversevul: A new vulnerable source code dataset for deep learning based vulnerability detection. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses. 654–668.
  15. CodeChecker. null. CodeChecker. https://codechecker.readthedocs.io/en/latest/
  16. CodeQL. null. CodeQL for Research. https://securitylab.github.com/tools/codeql/
  17. Cppcheck. null. A Tool for Static C/C++ Code Analysis. http://cppcheck.sourceforge.net/
  18. An empirical study of rule-based and learning-based approaches for static application security testing. In Proceedings of the 15th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). 1–12.
  19. Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Y. Wu Y.K. Li Fuli Luo Yingfei Xiong Wenfeng Liang Daya Guo, Qihao Zhu. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence. https://arxiv.org/abs/2401.14196
  20. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314 (2023).
  21. DevSkim. null. DevSkim. https://github.com/microsoft/DevSkim
  22. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904 (2022).
  23. Dlint. null. Dlint: a tool for encouraging best coding practices and helping ensure Python code is secure. https://github.com/dlint-py/dlint
  24. Hugging Face. 2024. Hugging Face Transformers library. https://huggingface.co/docs/transformers/index
  25. AC/C++ code vulnerability dataset with code changes and CVE summaries. In Proceedings of the 17th International Conference on Mining Software Repositories. 508–512.
  26. Large-scale analysis of framework-specific exceptions in android apps. In Proceedings of the 40th International Conference on Software Engineering. 408–419.
  27. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547.
  28. Flawfinder. null. Flawfinder. https://dwheeler.com/flawfinder/
  29. Michael Fu and Chakkrit Tantithamthavorn. 2022. LineVul: A Transformer-based Line-Level Vulnerability Prediction. In MSR. ACM, 608–620.
  30. Chatgpt for vulnerability detection, classification, and repair: How far are we? APSEC (2023).
  31. Large language models are few-shot summarizers: Multi-intent comment generation via in-context learning. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13.
  32. Katerina Goseva-Popstojanova and Andrei Perhinschi. 2015. On the capability of static code analysis to detect security vulnerabilities. Information and Software Technology 68 (2015), 18–33.
  33. Graudit. null. Graudit: source code auditing tool. https://github.com/wireghoul/graudit
  34. Unixcoder: Unified cross-modal pre-training for code representation. arXiv preprint arXiv:2203.03850 (2022).
  35. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
  36. Andrew Habib and Michael Pradel. 2018. How many of all bugs do we find? a study of static bug detectors. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 317–328.
  37. Hazim Hanif and Sergio Maffeis. 2022. VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection. In International Joint Conference on Neural Networks, IJCNN 2022, Padua, Italy, July 18-23, 2022. IEEE, 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892280
  38. Deepjit: an end-to-end deep learning framework for just-in-time defect prediction. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 34–45.
  39. Large Language Models for Software Engineering: A Systematic Literature Review. arXiv:2308.10620 [cs.SE]
  40. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  41. Huggingface. 2024. Huggingface Open LLM Leaderboard. https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
  42. Infer. null. A Tool to Detect Bugs in Java and C/C++/Objective-c Code. https://fbinfer.com/
  43. Insidersec. 2022. Insider. https://github.com/insidersec/insider/
  44. Impact of code language models on automated program repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1430–1442.
  45. Detecting false alarms from automatic static analysis tools: How far are we?. In Proceedings of the 44th International Conference on Software Engineering. 698–709.
  46. Arvinder Kaur and Ruchikaa Nayyar. 2020. A comparative study of static code analysis tools for vulnerability detection in c/c++ and java source code. Procedia Computer Science 171 (2020), 2023–2029.
  47. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  48. Evaluation of open-source IDE plugins for detecting security vulnerabilities. In Proceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering. 200–209.
  49. Comparison and Evaluation on Static Application Security Testing (SAST) Tools for Java. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 921–933.
  50. StarCoder: may the source be with you! (2023). arXiv:2305.06161 [cs.CL]
  51. An empirical study on the effectiveness of static C code analyzers for vulnerability detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 544–555.
  52. A comprehensive study on quality assurance tools for java. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 285–297.
  53. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965.
  54. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems 36 (2024).
  55. Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13.
  56. StarCoder 2 and The Stack v2: The Next Generation. arXiv preprint arXiv:2402.19173 (2024).
  57. Pissa: Principal singular values and singular vectors adaptation of large language models. arXiv preprint arXiv:2404.02948 (2024).
  58. Meta. 2023. CodeLlama. https://huggingface.co/samaxr/codellma-7b/
  59. Meta. 2024. Llama3. https://llama.meta.com/llama3/
  60. Microsoft. 2024. Phi3. https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/
  61. On the adoption of static analysis for software security assessment–A case study of an open-source e-government project. computers & security 111 (2021), 102470.
  62. CrossVul: a cross-language vulnerability dataset with commit data. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1565–1569.
  63. An empirical study on combining diverse static analysis tools for web security vulnerabilities based on development scenarios. Computing 101 (2019), 161–185.
  64. The University of Maryland. 2022. SpotBugs. https://spotbugs.github.io/
  65. National Institute of Standards and Technology. 2017. Juliet Test Suite. https://samate.nist.gov/SARD/test-suites
  66. OpenAI. 2024. OpenAI GPT-3.5 (gpt-3.5-turbo-0125). https://platform.openai.com/docs/guides/text-generation/chat-completions-api
  67. PTLVD:Program Slicing and Transformer-based Line-level Vulnerability Detection System. In 23rd IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2023, Bogotá, Colombia, October 2-3, 2023, Leon Moonen, Christian D. Newman, and Alessandra Gorla (Eds.). IEEE, 162–173. https://doi.org/10.1109/SCAM59687.2023.00026
  68. Analyzing the analyzers: Flowdroid/iccta, amandroid, and droidsafe. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 176–186.
  69. Contrast Security. 2022. Contrast Security. https://www.contrastsecurity.com/
  70. Semgrep. 2022. Semgrep. https://github.com/semgrep/semgrep
  71. SonarSource. 2022. SonarQube. https://www.sonarqube.org/
  72. Statista. 2023. Number of common IT security vulnerabilities and exposures (CVEs) worldwide from 2009 to 2024 YTD. https://www.statista.com/statistics/500755/worldwide-common-vulnerabilities-and-exposures/
  73. Static code analysis tools: A systematic literature review. In Ann. DAAAM Proc. Int. DAAAM Symp, Vol. 31. 565–573.
  74. To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 50–59.
  75. TIOBE. 2024. TIOBE Index for May 2024. https://www.tiobe.com/tiobe-index/
  76. David A Tomassi. 2018. Bugs in the wild: examining the effectiveness of static analyzers at finding real-world bugs. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 980–982.
  77. Efficient methods for natural language processing: A survey. Transactions of the Association for Computational Linguistics 11 (2023), 826–860.
  78. Attention is all you need. Advances in neural information processing systems 30 (2017).
  79. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
  80. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021).
  81. VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection. arXiv:2404.15596 [cs.SE]
  82. Exploring parameter-efficient fine-tuning techniques for code generation with large language models. arXiv preprint arXiv:2308.10462 (2023).
  83. An empirical study on data sampling for just-in-time defect prediction. In Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19–23, 2021, Proceedings, Part II 7. Springer, 54–69.
  84. Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays!. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2287–2298. https://doi.org/10.1109/ICSE48619.2023.00192
  85. Vulnerability Detection by Learning From Syntax-Based Execution Paths of Code. IEEE Trans. Software Eng. 49, 8 (2023), 4196–4212. https://doi.org/10.1109/TSE.2023.3286586
  86. The devil is in the tails: How long-tailed code distributions impact large language models. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 40–52.
  87. Large Language Model for Vulnerability Detection: Emerging Results and Future Directions. ICSE NIER track (2024).
  88. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems 32 (2019).
  89. Zupit. 2022. Horusec. https://docs.horusec.io/docs/overview/
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xin Zhou (319 papers)
  2. Duc-Manh Tran (1 paper)
  3. Thanh Le-Cong (19 papers)
  4. Ting Zhang (174 papers)
  5. Ivana Clairine Irsan (14 papers)
  6. Joshua Sumarlin (1 paper)
  7. Bach Le (17 papers)
  8. David Lo (229 papers)
Citations (3)