Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs (2407.16576v1)

Published 23 Jul 2024 in cs.CR

Abstract: While the automated detection of cryptographic API misuses has progressed significantly, its precision diminishes for intricate targets due to the reliance on manually defined patterns. LLMs, renowned for their contextual understanding, offer a promising avenue to address existing shortcomings. However, applying LLMs in this security-critical domain presents challenges, particularly due to the unreliability stemming from LLMs' stochastic nature and the well-known issue of hallucination. To explore the prevalence of LLMs' unreliable analysis and potential solutions, this paper introduces a systematic evaluation framework to assess LLMs in detecting cryptographic misuses, utilizing a comprehensive dataset encompassing both manually-crafted samples and real-world projects. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. Nevertheless, we demonstrate how a constrained problem scope, coupled with LLMs' self-correction capability, significantly enhances the reliability of the detection. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks. Moreover, we identify the failure patterns that persistently hinder LLMs' reliability, including both cryptographic knowledge deficiency and code semantics misinterpretation. Guided by these insights, we develop an LLM-based workflow to examine open-source repositories, leading to the discovery of 63 real-world cryptographic misuses. Of these, 46 have been acknowledged by the development community, with 23 currently being addressed and 6 resolved. Reflecting on developers' feedback, we offer recommendations for future research and the development of LLM-based security tools.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Oracle, “Java cryptography architecture.” https://docs.oracle.com/javase/9/security/java-cryptography-architecture-jca-reference-guide.htm, 2021.
  2. PyCryptodome, “Cryptographic library for python.” https://pypi.org/project/pycryptodome/, 2024.
  3. N. Meng, S. Nagy, D. Yao, W. Zhuang, and G. Arango-Argoty, “Secure coding practices in java: Challenges and vulnerabilities,” in 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp. 372–383, 2018.
  4. M. Chen, F. Fischer, N. Meng, X. Wang, and J. Grossklags, “How reliable is the crowdsourced knowledge of security implementation?,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 536–547, 2019.
  5. S. Rahaman, Y. Xiao, S. Afrose, F. Shaon, K. Tian, M. Frantz, M. Kantarcioglu, and D. D. Yao, “Cryptoguard: High precision detection of cryptographic vulnerabilities in massive-sized java projects,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS ’19, (New York, NY, USA), p. 2455–2472, Association for Computing Machinery, 2019.
  6. S. Krüger, J. Späth, K. Ali, E. Bodden, and M. Mezini, “Crysl: An extensible approach to validating the correct usage of cryptographic apis,” IEEE Transactions on Software Engineering, vol. 47, no. 11, pp. 2382–2400, 2021.
  7. L. Zhang, J. Chen, W. Diao, S. Guo, J. Weng, and K. Zhang, “CryptoREX: Large-scale analysis of cryptographic misuse in IoT devices,” in 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019), (Chaoyang District, Beijing), pp. 151–164, USENIX Association, Sept. 2019.
  8. S. Krüger, S. Nadi, M. Reif, K. Ali, M. Mezini, E. Bodden, F. Göpfert, F. Günther, C. Weinert, D. Demmler, and R. Kamath, “Cognicrypt: Supporting developers in using cryptography,” in 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 931–936, 2017.
  9. A.-K. Wickert, L. Baumgärtner, F. Breitfelder, and M. Mezini, “Python crypto misuses in the wild,” in Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), ESEM ’21, (New York, NY, USA), Association for Computing Machinery, 2021.
  10. S. Rahaman and D. D. Yao, “Program analysis of cryptographic implementations for security,” pp. 61–68, 09 2017.
  11. A. S. Ami, N. Cooper, K. Kafle, K. Moran, D. Poshyvanyk, and A. Nadkarni, “Why crypto-detectors fail: A systematic evaluation of cryptographic misuse detection techniques,” in 2022 IEEE Symposium on Security and Privacy (SP), pp. 614–631, 2022.
  12. Y. Chen, Y. Liu, K. Wu, D. Le, and S. Chau, “Towards precise reporting of cryptographic misuses,” 01 2024.
  13. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  14. “Codex suffix api.” https://beta.openai.com/docs/api-reference/completions/create#completions/create-suffix, 2022.
  15. C. Fang, N. Miao, S. Srivastav, J. Liu, R. Zhang, R. Fang, A. Asmita, R. Tsang, N. Nazari, H. Wang, et al., “Large language models for code analysis: Do llms really do their job?,” arXiv preprint arXiv:2310.12357, 2023.
  16. P. Liu, J. Liu, L. Fu, K. Lu, Y. Xia, X. Zhang, W. Chen, H. Weng, S. Ji, and W. Wang, “How chatgpt is solving vulnerability management problem,” 2023.
  17. W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., “A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023.
  18. “List of rainbow tables.” http://project-rainbowcrack.com/table.htm, 2017.
  19. K. Bhargavan and G. Leurent, “Transcript collision attacks: Breaking authentication in tls, ike, and ssh,” 01 2016.
  20. M. Stevens, E. Bursztein, P. Karpman, A. Albertini, and Y. Markov, “The first collision for full sha-1.” Cryptology ePrint Archive, Paper 2017/190, 2017. https://eprint.iacr.org/2017/190.
  21. V. van der Veen, Y. Fratantonio, M. Lindorfer, D. Gruss, C. Maurice, G. Vigna, H. Bos, K. Razavi, and C. Giuffrida, “Drammer: Deterministic rowhammer attacks on mobile platforms,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, (New York, NY, USA), p. 1675–1689, Association for Computing Machinery, 2016.
  22. S. Afrose, Y. Xiao, S. Rahaman, B. P. Miller, and D. Yao, “Evaluation of static vulnerability detection tools with java cryptographic api benchmarks,” IEEE Transactions on Software Engineering, vol. 49, no. 2, pp. 485–497, 2023.
  23. Y. Zhang, M. M. A. Kabir, Y. Xiao, D. Yao, and N. Meng, “Automatic detection of java cryptographic api misuses: Are we there yet?,” IEEE Transactions on Software Engineering, vol. 49, no. 1, pp. 288–303, 2023.
  24. “Androguard: Reverse engineering, malware and goodware analysis of android applications.” https://github.com/androguard/androguard, 2020.
  25. S. Fahl, M. Harbach, T. Muders, L. Baumgärtner, B. Freisleben, and M. Smith, “Why eve and mallory love android: an analysis of android ssl (in)security,” in Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS ’12, (New York, NY, USA), p. 50–61, Association for Computing Machinery, 2012.
  26. M. Egele, D. Brumley, Y. Fratantonio, and C. Kruegel, “An empirical study of cryptographic misuse in android applications,” in Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, CCS ’13, (New York, NY, USA), p. 73–84, Association for Computing Machinery, 2013.
  27. W. Li, S. Jia, L. Liu, F. Zheng, Y. Ma, and J. Lin, “Cryptogo: Automatic detection of go cryptographic api misuses,” in Proceedings of the 38th Annual Computer Security Applications Conference, pp. 318–331, 2022.
  28. Z. Xu, X. Hu, Y. Tao, and S. Qin, “Analyzing cryptographic api usages for android applications using hmm and n-gram,” in 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE), pp. 153–160, 2020.
  29. L. Piccolboni, G. Di Guglielmo, L. P. Carloni, and S. Sethumadhavan, “Crylogger: Detecting crypto misuses dynamically,” in 2021 IEEE Symposium on Security and Privacy (SP), pp. 1972–1989, IEEE, 2021.
  30. S. Afrose, S. Rahaman, and D. Yao, “Cryptoapi-bench: A comprehensive benchmark on java cryptographic api misuses,” in 2019 IEEE Cybersecurity Development (SecDev), pp. 49–61, 2019.
  31. M. Schlichtig, A.-K. Wickert, S. Krüger, E. Bodden, and M. Mezini, “Cambench–cryptographic api misuse detection tool benchmark suite,” arXiv preprint arXiv:2204.06447, 2022.
  32. J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  33. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in neural information processing systems, vol. 35, pp. 27730–27744, 2022.
  34. K. Pei, D. Bieber, K. Shi, C. Sutton, and P. Yin, “Can large language models reason about program invariants?,” in Proceedings of the 40th International Conference on Machine Learning, ICML’23, JMLR.org, 2023.
  35. OpenAI, “Chatgpt.” https://chat.openai.com/, 2023.
  36. S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg, et al., “Sparks of artificial general intelligence: Early experiments with gpt-4,” arXiv preprint arXiv:2303.12712, 2023.
  37. Google, “Gemini.” https://gemini.google.com/app, 2023.
  38. B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, et al., “Code llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950, 2023.
  39. D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y. Wu, Y. Li, et al., “Deepseek-coder: When the large language model meets programming–the rise of code intelligence,” arXiv preprint arXiv:2401.14196, 2024.
  40. J. Liu, C. S. Xia, Y. Wang, and L. Zhang, “Is your code generated by chatGPT really correct? rigorous evaluation of large language models for code generation,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  41. “Scala.” https://www.scala-lang.org/, 2024.
  42. “Spark. unified engine for large-scale data analytics.” https://spark.apache.org/, 2024.
  43. K. Moriarty, “Pkcs# 5: Password-based cryptography specification version 2.1,” 2017.
  44. Oracle, “Classes and interfaces for cryptographic operations.” https://docs.oracle.com/javase/8/docs/api/javax/crypto/package-summary.html, 2024.
  45. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  46. F. F. Xu, U. Alon, G. Neubig, and V. J. Hellendoorn, “A systematic evaluation of large language models of code,” in Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, MAPS 2022, (New York, NY, USA), p. 1–10, Association for Computing Machinery, 2022.
  47. C. S. Xia and L. Zhang, “Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt,” arXiv preprint arXiv:2304.00385, 2023.
  48. H. Li, Y. Hao, Y. Zhai, and Z. Qian, “Enhancing static analysis for practical bug detection: An llm-integrated approach,” Proc. ACM Program. Lang., vol. 8, apr 2024.
  49. “Spotbugs: Find bugs in java programs.” https://spotbugs.github.io/, 2024.
  50. “Apache deltaspike.” https://deltaspike.apache.org/index.html, 2024.
  51. “Apache® druid, a high performance, real-time analytics database.” https://druid.apache.org/, 2024.
  52. “Github advisory database.” https://github.com/advisories, 2024.
  53. “Project planning for developers.” https://github.com/features/issues, 2024.
  54. “Hugging face.” https://huggingface.co/, 2024.
  55. T. A. Java, “All algorithms implemented in java.” https://github.com/TheAlgorithms/Java, 2024.
  56. Localstack, “A fully functional local aws cloud stack..” https://github.com/localstack/localstack, 2024.
  57. CrackMapExec, “A swiss army knife for pentesting networks.” https://github.com/byt3bl33d3r/CrackMapExec, 2024.
  58. Chaosthebot, “A social coding experiment that updates its own code democratically.” https://github.com/Chaosthebot/Chaos, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yifan Xia (14 papers)
  2. Zichen Xie (3 papers)
  3. Peiyu Liu (27 papers)
  4. Kangjie Lu (12 papers)
  5. Yan Liu (421 papers)
  6. Wenhai Wang (123 papers)
  7. Shouling Ji (136 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.