Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How well does LLM generate security tests? (2310.00710v2)

Published 1 Oct 2023 in cs.CR and cs.SE

Abstract: Developers often build software on top of third-party libraries (Libs) to improve programmer productivity and software quality. The libraries may contain vulnerabilities exploitable by hackers to attack the applications (Apps) built on top of them. People refer to such attacks as supply chain attacks, the documented number of which has increased 742% in 2022. People created tools to mitigate such attacks, by scanning the library dependencies of Apps, identifying the usage of vulnerable library versions, and suggesting secure alternatives to vulnerable dependencies. However, recent studies show that many developers do not trust the reports by these tools; they ask for code or evidence to demonstrate how library vulnerabilities lead to security exploits, in order to assess vulnerability severity and modification necessity. Unfortunately, manually crafting demos of application-specific attacks is challenging and time-consuming, and there is insufficient tool support to automate that procedure. In this study, we used ChatGPT-4.0 to generate security tests, and to demonstrate how vulnerable library dependencies facilitate the supply chain attacks to given Apps. We explored various prompt styles/templates, and found that ChatGPT-4.0 generated tests for all 55 Apps, demonstrating 24 attacks successfully. It outperformed two state-of-the-art security test generators -- TRANSFER and SIEGE -- by generating a lot more tests and achieving more exploits. ChatGPT-4.0 worked better when prompts described more on the vulnerabilities, possible exploits, and code context. Our research will shed light on new research in security test generation. The generated tests will help developers create secure by design and secure by default software.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. 2019. GitHub - nearform / gammaray: Node.js vulnerability scanner. https://github.com/nearform/gammaray.
  2. 2020. OWASP Dependency-Check. https://owasp.org/www-project-dependency-check/.
  3. 2021. Find Security Bugs. https://find-sec-bugs.github.io/
  4. 2021. npm audit: Broken by Design. https://overreacted.io/npm-audit-broken-by-design/.
  5. 2021. SonarQube. https://github.com/SonarSource/sonarqube.
  6. 2021a. Supply chain attacks on open source software grew 650% in 2021. https://techmonitor.ai/technology/cybersecurity/supply-chain-attacks-open-source-software-grew-650-percent-2021.
  7. 2021b. Supply chain attacks show why you should be wary of third-party providers. https://www.csoonline.com/article/3191947/supply-chain-attacks-show-why-you-should-be-wary-of-third-party-providers.html.
  8. 2021. Xanitizer by RIGS IT - Because Security Matters. https://www.rigs-it.com/xanitizer/
  9. 2022. False Positives in Vulnerability Scanning: Why We Think We Can Do Better. https://www.lunasec.io/docs/blog/the-issue-with-vuln-scanners/.
  10. 2023. About Dependabot alerts. https://docs.github.com/en/code-security/dependabot/dependabot-alerts/about-dependabot-alerts.
  11. 2023. alibaba / fastjson. https://github.com/alibaba/fastjson.
  12. 2023. american fuzzy lop. https://lcamtuf.coredump.cx/afl/.
  13. 2023a. apache / commons-io. https://github.com/apache/commons-io.
  14. 2023b. apache / cxf. https://github.com/apache/cxf.
  15. 2023. apache / httpcomponents-client. https://github.com/apache/httpcomponents-client.
  16. 2023. Apache Tika. https://tika.apache.org.
  17. 2023a. Automatic fixing with snyk fix - Snyk User Docs. https://docs.snyk.io/snyk-cli/test-for-vulnerabilities/automatic-remediation-with-snyk-fix.
  18. 2023c. Codec. https://commons.apache.org/proper/commons-codec/.
  19. 2023d. Commons Compress - Overview. https://commons.apache.org/proper/commons-compress/.
  20. 2023. CVE-2020-28052 Detail. https://nvd.nist.gov/vuln/detail/cve-2020-28052.
  21. 2023. Dom4j. https://dom4j.github.io.
  22. 2023. eclipse / rdf4j. https://github.com/eclipse/rdf4j.
  23. 2023a. FasterXML / jackson-databind. https://github.com/FasterXML/jackson-databind.
  24. 2023b. FasterXML / jackson-dataformats-binary. https://github.com/FasterXML/jackson-dataformats-binary.
  25. 2023c. FasterXML / jackson-modules-java8. https://github.com/FasterXML/jackson-modules-java8.
  26. 2023. haraldk / TwelveMonkeys. https://github.com/haraldk/TwelveMonkeys.
  27. 2023. How does ChatGPT actually work? https://www.zdnet.com/article/how-does-chatgpt-work/.
  28. 2023. Inside ChatGPT’s Brain: Large Language Models. https://serokell.io/blog/language-models-behind-chatgpt.
  29. 2023. Introducing ChatGPT. https://openai.com/blog/chatgpt.
  30. 2023. junrar / junrar. https://github.com/junrar/junrar.
  31. 2023. Mockito. https://site.mockito.org/. Accessed on June 12, 2023.
  32. 2023a. netplex / json-smart-v1. https://github.com/netplex/json-smart-v1.
  33. 2023b. netplex / json-smart-v2. https://github.com/netplex/json-smart-v2.
  34. 2023. npm-audit. https://docs.npmjs.com/cli/v9/commands/npm-audit.
  35. 2023. NVD. https://nvd.nist.gov.
  36. 2023. OpenRefine. https://github.com/OpenRefine/OpenRefine.
  37. 2023. OSS-Fuzz. https://google.github.io/oss-fuzz/.
  38. 2023c. OWASP / json-sanitizer. https://github.com/OWASP/json-sanitizer.
  39. 2023. OWASP Top Ten. https://owasp.org/www-project-top-ten/.
  40. 2023. Plexus Archiver Component. https://codehaus-plexus.github.io/plexus-archiver/index.html.
  41. 2023. Retire.js. https://retirejs.github.io/retire.js/.
  42. 2023. sonatype-nexus-community / auditjs: Audits an NPM package.json file to identify known vulnerabilities. https://github.com/sonatype-nexus-community/auditjs.
  43. 2023a. spring-projects / spring-data-commons. https://github.com/spring-projects/spring-data-commons.
  44. 2023b. spring-projects / spring-security. https://github.com/spring-projects/spring-security.
  45. 2023. srikanth-lingala / zip4j. https://github.com/srikanth-lingala/zip4j.
  46. 2023. stleary / JSON-java. https://github.com/stleary/JSON-java.
  47. 2023b. Test - Snyk User Docs. https://docs.snyk.io/snyk-cli/commands/test.
  48. 2023. The Legion of the Bouncy Castle. https://www.bouncycastle.org.
  49. 2023. The Next Supply Chain Attack Vector: Open-Source Software. https://www.supplychainbrain.com/blogs/1-think-tank/post/36830-the-next-supply-attack-vector-open-source-software.
  50. 2023. What Is Fuzz Testing and How Does It Work? — Synopsys. https://www.synopsys.com/glossary/what-is-fuzz-testing.html.
  51. 2023. xerial / snappy-java. https://github.com/xerial/snappy-java.
  52. 2023. XStream. https://x-stream.github.io.
  53. 2023. ZT Zip. https://github.com/zeroturnaround/zt-zip.
  54. FuSeBMC: An Energy-Efficient Test Generator for Finding Security Vulnerabilities in C Programs. In Tests and Proofs, Frédéric Loulergue and Franz Wotawa (Eds.). Springer International Publishing, Cham, 85–105.
  55. Automatic exploit generation. Commun. ACM 57, 2 (2014), 74–84.
  56. Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications. In 2008 IEEE Symposium on Security and Privacy (sp 2008). 143–157. https://doi.org/10.1109/SP.2008.17
  57. Unleashing Mayhem on Binary Code. In 2012 IEEE Symposium on Security and Privacy. 380–394. https://doi.org/10.1109/SP.2012.31
  58. Code-Based Vulnerability Detection in Node.Js Applications: How Far Are We?. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (Virtual Event, Australia) (ASE ’20). Association for Computing Machinery, New York, NY, USA, 1199–1203. https://doi.org/10.1145/3324884.3421838
  59. Revolutionizing the Field of Grey-box Attack Surface Testing with Evolutionary Fuzzing. In Black Hat and DEFCON.
  60. Stack overflow considered harmful? the impact of copy&paste on android application security. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 121–136.
  61. Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.
  62. Automatic Discovery of API-Level Exploits. In Proceedings of the 27th International Conference on Software Engineering (St. Louis, MO, USA) (ICSE ’05). Association for Computing Machinery, New York, NY, USA, 312–321. https://doi.org/10.1145/1062455.1062518
  63. SAGE: Whitebox Fuzzing for Security Testing: SAGE Has Had a Remarkable Impact at Microsoft. Queue 10, 1 (jan 2012), 20–27. https://doi.org/10.1145/2090147.2094081
  64. Toward automated exploit generation for known vulnerabilities in open-source libraries. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 396–400.
  65. ChatGPT and Software Testing Education: Promises & Perils. In 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE Computer Society, Los Alamitos, CA, USA, 4130–4137. https://doi.org/10.1109/ICSTW58534.2023.00078
  66. How Do Developers Follow Security-Relevant Best Practices When Using NPM Packages?. In 2022 IEEE Secure Development Conference (SecDev). IEEE Computer Society, Los Alamitos, CA, USA, 77–83. https://doi.org/10.1109/SecDev53368.2022.00027
  67. Test mimicry to assess the exploitability of library vulnerabilities. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 276–288.
  68. CogniCrypt: supporting developers in using cryptography. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 931–936.
  69. Do developers update their library dependencies? An empirical study on the impact of security advisories on library migration. Empirical Software Engineering 23 (2018), 384–417.
  70. Security test generation using threat trees. In 2009 ICSE Workshop on Automation of Software Test. 62–69. https://doi.org/10.1109/IWAST.2009.5069042
  71. BMC+Fuzz: Efficient and Effective Test Generation. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1419–1424. https://doi.org/10.23919/DATE54114.2022.9774672
  72. Comparing Software Developers with ChatGPT: An Empirical Investigation. arXiv:2305.11837 [cs.SE]
  73. Up2Dep: Android Tool Support to Fix Insecure Code Dependencies. In Annual Computer Security Applications Conference (Austin, USA) (ACSAC ’20). Association for Computing Machinery, New York, NY, USA, 263–276. https://doi.org/10.1145/3427228.3427658
  74. The End of an Era: Can Ai Subsume Software Developers? Evaluating Chatgpt and Copilot Capabilities Using Leetcode Problems. http://dx.doi.org/10.2139/ssrn.4422122.
  75. Vulnerable Open Source Dependencies: Counting Those That Matter. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (Oulu, Finland) (ESEM ’18). Association for Computing Machinery, New York, NY, USA, Article 42, 10 pages. https://doi.org/10.1145/3239235.3268920
  76. Vuln4real: A methodology for counting actually vulnerable dependencies. IEEE Transactions on Software Engineering 48, 5 (2020), 1592–1609.
  77. Detection, assessment and mitigation of vulnerabilities in open source dependencies. Empirical Software Engineering 25, 5 (2020), 3175–3215.
  78. A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software. In Proceedings of the 16th International Conference on Mining Software Repositories (Montreal, Quebec, Canada) (MSR ’19). IEEE Press, 383–387. https://doi.org/10.1109/MSR.2019.00064
  79. Cryptoguard: High precision detection of cryptographic vulnerabilities in massive-sized Java projects. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2455–2472.
  80. Kristiina Rahkema and Dietmar Pfahl. 2022. SwiftDependencyChecker: Detecting Vulnerable Dependencies Declared through CocoaPods, Carthage and Swift PM. In Proceedings of the 9th IEEE/ACM International Conference on Mobile Software Engineering and Systems (Pittsburgh, Pennsylvania) (MOBILESoft ’22). Association for Computing Machinery, New York, NY, USA, 107–111. https://doi.org/10.1145/3524613.3527806
  81. CryptoTutor: Teaching Secure Coding Practices through Misuse Pattern Detection. In Proceedings of the 21st Annual Conference on Information Technology Education. 403–408.
  82. An Analysis of the Automatic Bug Fixing Performance of ChatGPT. arXiv:2301.08653 [cs.SE]
  83. Fuzzing for Software Security Testing and Quality Assurance, Second Edition. Artech House.
  84. Is ChatGPT the Ultimate Programming Assistant – How far is it? arXiv:2304.11938 [cs.SE]
  85. Automated Security Test Generation with Formal Threat Models. IEEE Transactions on Dependable and Secure Computing 9, 4 (2012), 526–540. https://doi.org/10.1109/TDSC.2012.24
  86. Analyzing Cryptographic API Usages for Android Applications Using HMM and N-Gram. In 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE). IEEE, 153–160.
  87. A survey of covert channels and countermeasures in computer network protocols. IEEE Communications Surveys & Tutorials 9, 3 (2007), 44–57. https://doi.org/10.1109/COMST.2007.4317620
  88. Automatic Detection of Java Cryptographic API Misuses: Are We There Yet? IEEE Transactions on Software Engineering 49, 1 (2022), 288–303.
  89. Example-Based Vulnerability Detection and Repair in Java Code. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension (Virtual Event) (ICPC ’22). Association for Computing Machinery, New York, NY, USA, 190–201. https://doi.org/10.1145/3524610.3527895
Citations (22)

Summary

We haven't generated a summary for this paper yet.