Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging the Gap: A Study of AI-based Vulnerability Management between Industry and Academia (2405.02435v1)

Published 3 May 2024 in cs.CR and cs.SE

Abstract: Recent research advances in AI have yielded promising results for automated software vulnerability management. AI-based models are reported to greatly outperform traditional static analysis tools, indicating a substantial workload relief for security engineers. However, the industry remains very cautious and selective about integrating AI-based techniques into their security vulnerability management workflow. To understand the reasons, we conducted a discussion-based study, anchored in the authors' extensive industrial experience and keen observations, to uncover the gap between research and practice in this field. We empirically identified three main barriers preventing the industry from adopting academic models, namely, complicated requirements of scalability and prioritization, limited customization flexibility, and unclear financial implications. Meanwhile, research works are significantly impacted by the lack of extensive real-world security data and expertise. We proposed a set of future directions to help better understand industry expectations, improve the practical usability of AI-based security vulnerability research, and drive a synergistic relationship between industry and academia.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Microsoft Copilot for Security. https://www.microsoft.com/en-us/security/business/ai-machine-learning/microsoft-copilot-security.
  2. SQIRL:Grey-Box Detection of SQL Injection Vulnerabilities Using Reinforcement Learning. In 32nd USENIX Security Symposium (USENIX Security 23), pages 6097–6114, 2023.
  3. Miltiadis Allamanis. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, pages 143–153, 2019.
  4. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):1–29, 2019.
  5. Prevention of phishing attacks using ai-based cybersecurity awareness training. Prevention, 2022.
  6. Purple llama cyberseceval: A secure coding benchmark for language models. arXiv preprint arXiv:2312.04724, 2023.
  7. Automatic patch-based exploit generation is possible: Techniques and implications. In 2008 IEEE Symposium on Security and Privacy (sp 2008), pages 143–157. IEEE, 2008.
  8. Deep learning based vulnerability detection: Are we there yet. IEEE Transactions on Software Engineering, 2021.
  9. Diversevul: A new vulnerable source code dataset for deep learning based vulnerability detection. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, pages 654–668, 2023.
  10. Data quality for software vulnerability datasets. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 121–133. IEEE, 2023.
  11. DARPA. Darpa ai cyber challenge aims to secure nation’s most critical software, 2023. Updated: 2023-10-23.
  12. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  13. Fira: fine-grained graph-based code change representation for automated commit message generation. In Proceedings of the 44th International Conference on Software Engineering, pages 970–981, 2022.
  14. Linevul: A transformer-based line-level vulnerability prediction. In Proceedings of the 19th International Conference on Mining Software Repositories, pages 608–620, 2022.
  15. GitHub Universe. https://githubuniverse.com, 2023.
  16. Google DeepMind. Competitive programming with alphacode. https://www.deepmind.com/blog/competitive-programming-with-alphacode, 2022.
  17. The rise of software vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches. Journal of Network and Computer Applications, 179:103009, 2021.
  18. On distribution shift in learning-based bug detectors. In International Conference on Machine Learning, pages 8559–8580. PMLR, 2022.
  19. Gollum: Modular and greybox exploit generation for heap overflows in interpreters. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 1689–1706, 2019.
  20. IBM. Project CodeNet. https://developer.ibm.com/exchanges/data/all/project-codenet/, 2021.
  21. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770, 2023.
  22. Vulnerability detection with fine-grained interpretations. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 292–303, 2021.
  23. Software vulnerability detection using deep neural networks: a survey. Proceedings of the IEEE, 108(10):1825–1848, 2020.
  24. Automatic generation of xss and sql injection attacks with goal-directed model checking. In USENIX Security symposium, pages 31–44, 2008.
  25. Shallow or deep? an empirical study on detecting vulnerabilities using deep learning. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC), pages 276–287. IEEE, 2021.
  26. Microsoft Security. What is vulnerability management? https://www.microsoft.com/en-us/security/business/security-101/what-is-vulnerability-management, 2023.
  27. Vulchecker: Graph-based vulnerability localization in source code. In 31st USENIX Security Symposium, Security 2022, 2023.
  28. MITRE. CWE Top 25 Most Dangerous Software Weaknesses. https://cwe.mitre.org/top25/, 2021.
  29. Open science in software engineering: A study on deep learning-based vulnerability detection. IEEE Transactions on Software Engineering, 49(4):1983–2005, 2022.
  30. Examining zero-shot vulnerability repair with large language models. In 2023 IEEE Symposium on Security and Privacy (SP), pages 2339–2356. IEEE, 2023.
  31. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  32. An empirical study of deep learning models for vulnerability detection. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2237–2248. IEEE, 2023.
  33. Autotransform: Automated code transformation to support modern code review process. In Proceedings of the 44th International Conference on Software Engineering, pages 237–248, 2022.
  34. Using pre-trained models to boost code review automation. In Proceedings of the 44th International Conference on Software Engineering, pages 2291–2302, 2022.
  35. GraphSPD: Graph-based security patch detection with enriched code semantics. In 2023 IEEE Symposium on Security and Privacy (SP), pages 2409–2426. IEEE, 2023.
  36. Detecting” 0-day” vulnerability: An empirical study of secret security patch in oss. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 485–492. IEEE, 2019.
  37. Patchdb: A large-scale security patch dataset. In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 149–160. IEEE, 2021.
  38. Patchrnn: A deep learning-based system for security patch identification. In MILCOM 2021-2021 IEEE Military Communications Conference (MILCOM), pages 595–600. IEEE, 2021.
  39. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery, 2023.
  40. Enhancing deep learning-based vulnerability detection by building behavior graph model. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2262–2274. IEEE, 2023.
  41. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems, 32, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shengye Wan (6 papers)
  2. Joshua Saxe (15 papers)
  3. Craig Gomes (1 paper)
  4. Sahana Chennabasappa (6 papers)
  5. Avilash Rath (2 papers)
  6. Kun Sun (51 papers)
  7. Xinda Wang (9 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com