Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SkipAnalyzer: A Tool for Static Code Analysis with Large Language Models (2310.18532v2)

Published 27 Oct 2023 in cs.SE

Abstract: We introduce SkipAnalyzer, a LLM-powered tool for static code analysis. SkipAnalyzer has three components: 1) an LLM-based static bug detector that scans source code and reports specific types of bugs, 2) an LLM-based false-positive filter that can identify false-positive bugs in the results of static bug detectors (e.g., the result of step 1) to improve detection accuracy, and 3) an LLM-based patch generator that can generate patches for the detected bugs above. As a proof-of-concept, SkipAnalyzer is built on ChatGPT, which has exhibited outstanding performance in various software engineering tasks. To evaluate SkipAnalyzer, we focus on two types of typical and critical bugs that are targeted by static bug detection, i.e., Null Dereference and Resource Leak as subjects. We employ Infer to aid the gathering of these two bug types from 10 open-source projects. Consequently, our experiment dataset contains 222 instances of Null Dereference bugs and 46 instances of Resource Leak bugs. Our study demonstrates that SkipAnalyzer achieves remarkable performance in the mentioned static analysis tasks, including bug detection, false-positive warning removal, and bug repair. In static bug detection, SkipAnalyzer achieves accuracy values of up to 68.37% for detecting Null Dereference bugs and 76.95% for detecting Resource Leak bugs, improving the precision of the current leading bug detector, Infer, by 12.86% and 43.13%, respectively. For removing false-positive warnings, SkipAnalyzer can reach a precision of up to 93.88% for Null Dereference bugs and 63.33% for Resource Leak bugs. Additionally, SkipAnalyzer surpasses state-of-the-art false-positive warning removal tools. Furthermore, in bug repair, SkipAnalyzer can generate syntactically correct patches to fix its detected bugs with a success rate of up to 97.30%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. R. Bavishi, H. Yoshida, and M. R. Prasad, “Phoenix: Automated data-driven synthesis of repairs for static analysis violations,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 613–624.
  2. R. van Tonder and C. L. Goues, “Static automated program repair for heap properties,” in Proceedings of the 40th International Conference on Software Engineering, 2018, pp. 151–162.
  3. K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyandè, “Avatar: Fixing semantic bugs with fix patterns of static analysis violations,” in 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019, pp. 1–12.
  4. D. Marcilio, R. Bonifácio, E. Monteiro, E. Canedo, W. Luz, and G. Pinto, “Are static analysis violations really fixed? a closer look at realistic usage of sonarqube,” in 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), 2019, pp. 209–219.
  5. A. Carvalho, W. Luz, D. Marcílio, R. Bonifácio, G. Pinto, and E. Dias Canedo, “C-3pr: A bot for fixing static analysis violations via pull requests,” in 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2020, pp. 161–171.
  6. S. Afrose, Y. Xiao, S. Rahaman, B. P. Miller, and D. Yao, “Evaluation of static vulnerability detection tools with java cryptographic api benchmarks,” IEEE Transactions on Software Engineering, vol. 49, no. 2, pp. 485–497, 2023.
  7. S. Lipp, S. Banescu, and A. Pretschner, “An empirical study on the effectiveness of static c code analyzers for vulnerability detection,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2022.   New York, NY, USA: Association for Computing Machinery, 2022, p. 544–555. [Online]. Available: https://doi.org/10.1145/3533767.3534380
  8. Y. Pan, X. Ge, C. Fang, and Y. Fan, “A systematic literature review of android malware detection using static analysis,” IEEE Access, vol. 8, pp. 116 363–116 379, 2020.
  9. C. S. Xia, Y. Wei, and L. Zhang, “Automated program repair in the era of large pre-trained language models,” in Proceedings of the 45th International Conference on Software Engineering, ser. ICSE ’23.   IEEE Press, 2023, p. 1482–1494. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00129
  10. L. Gazzola, D. Micucci, and L. Mariani, “Automatic software repair: A survey,” in Proceedings of the 40th International Conference on Software Engineering, 2018, pp. 1219–1219.
  11. G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, “Voyager: An open-ended embodied agent with large language models,” arXiv preprint arXiv:2305.16291, 2023.
  12. C. H. Song, J. Wu, C. Washington, B. M. Sadler, W.-L. Chao, and Y. Su, “Llm-planner: Few-shot grounded planning for embodied agents with large language models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2998–3009.
  13. X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” arXiv preprint arXiv:2308.10620, 2023.
  14. A. Kharkar, R. Z. Moghaddam, M. Jin, X. Liu, X. Shi, C. Clement, and N. Sundaresan, “Learning to reduce false positives in analytic bug detectors,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1307–1316.
  15. N. S. Harzevili, J. Shin, J. Wang, and S. Wang, “Characterizing and understanding software security vulnerabilities in machine learning libraries,” arXiv preprint arXiv:2203.06502, 2022.
  16. J. Cao, M. Li, M. Wen, and S.-c. Cheung, “A study on prompt design, advantages and limitations of chatgpt for deep learning program repair,” arXiv preprint arXiv:2304.08191, 2023.
  17. S. Feng and C. Chen, “Prompting is all your need: Automated android bug replay with large language models,” arXiv preprint arXiv:2306.01987, 2023.
  18. L. Li, T. F. Bissyandé, M. Papadakis, S. Rasthofer, A. Bartel, D. Octeau, J. Klein, and L. Traon, “Static analysis of android apps: A systematic literature review,” Information and Software Technology, vol. 88, pp. 67–95, 2017.
  19. Q. Ashfaq, R. Khan, and S. Farooq, “A comparative analysis of static code analysis tools that check java code adherence to java coding standards,” in 2019 2nd International Conference on Communication, Computing and Digital systems (C-CODE).   IEEE, 2019, pp. 98–103.
  20. C. Vassallo, S. Panichella, F. Palomba, S. Proksch, H. C. Gall, and A. Zaidman, “How developers engage with static analysis tools in different contexts,” Empirical Software Engineering, vol. 25, pp. 1419–1457, 2020.
  21. D. A. Tomassi and C. Rubio-González, “On the real-world effectiveness of static bug detectors at finding null pointer exceptions,” in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021, pp. 292–303.
  22. N. S. Harzevili et al., “Automatic static bug detection for machine learning libraries: Are we there yet?” ArXiv, 2023, accessed 18 Oct. 2023. [Online]. Available: https://arxiv.org/abs/2307.04080
  23. Infer. Infer official website. [Online]. Available: https://fbinfer.com/
  24. D. A. Tomassi, “Bugs in the wild: Examining the effectiveness of static analyzers at finding real-world bugs,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2018.   New York, NY, USA: Association for Computing Machinery, 2018, p. 980–982.
  25. D. Hovemeyer and W. Pugh, “Finding bugs is easy,” SIGPLAN Not., vol. 39, no. 12, p. 92–106, dec 2004. [Online]. Available: https://doi.org/10.1145/1052883.1052895
  26. B. Cole, D. Hakim, D. Hovemeyer, R. Lazarus, W. Pugh, and K. Stephens, “Improving your software using static analysis to find bugs,” in Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications, ser. OOPSLA ’06.   New York, NY, USA: Association for Computing Machinery, 2006, p. 673–674. [Online]. Available: https://doi.org/10.1145/1176617.1176667
  27. Google. (2023) Errorprone. Accessed on Date. [Online]. Available: https://errorprone.info/index
  28. H. J. Kang, K. L. Aw, and D. Lo, “Detecting false alarms from automatic static analysis tools: How far are we?” in Proceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 698–709.
  29. Q. Hanam, L. Tan, R. Holmes, and P. Lam, “Finding patterns in static analysis alerts: Improving actionable alert ranking,” in Proceedings of the 11th Working Conference on Mining Software Repositories, ser. MSR 2014.   New York, NY, USA: Association for Computing Machinery, 2014, p. 152–161. [Online]. Available: https://doi.org/10.1145/2597073.2597100
  30. X. Yang, J. Chen, R. Yedida, Z. Yu, and T. Menzies, “Learning to recognize actionable static code warnings (is intrinsically easy),” Empirical Softw. Engg., vol. 26, no. 3, may 2021. [Online]. Available: https://doi.org/10.1007/s10664-021-09948-6
  31. H. Shen, J. Fang, and J. Zhao, “Efindbugs: Effective error ranking for findbugs,” in 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation, 2011, pp. 299–308.
  32. Z. P. Reynolds, A. B. Jayanth, U. Koc, A. A. Porter, R. R. Raje, and J. H. Hill, “Identifying and documenting false positive patterns generated by static code analysis tools,” in 2017 IEEE/ACM 4th International Workshop on Software Engineering Research and Industrial Practice (SER&IP), 2017, pp. 55–61.
  33. T. Muske and A. Serebrenik, “Techniques for efficient automated elimination of false positives,” in 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM), 2020, pp. 259–263.
  34. S. Heckman and L. Williams, “A systematic literature review of actionable alert identification techniques for automated static code analysis,” Information and Software Technology, vol. 53, no. 4, pp. 363–387, 2011.
  35. M. Christakis and C. Bird, “What developers want and need from program analysis: an empirical study,” in Proceedings of the 31st IEEE/ACM international conference on automated software engineering, 2016, pp. 332–343.
  36. B. Johnson, Y. Song, E. Murphy-Hill, and R. Bowdidge, “Why don’t software developers use static analysis tools to find bugs?” in 2013 35th International Conference on Software Engineering (ICSE), 2013, pp. 672–681.
  37. U. Koc, S. Wei, J. S. Foster, M. Carpuat, and A. A. Porter, “An empirical assessment of machine learning approaches for triaging reports of a java static analysis tool,” in 2019 12th ieee conference on software testing, validation and verification (icst).   IEEE, 2019, pp. 288–299.
  38. M. Junker, R. Huuck, A. Fehnker, and A. Knapp, “Smt-based false positive elimination in static program analysis,” in Formal Methods and Software Engineering: 14th International Conference on Formal Engineering Methods, ICFEM 2012, Kyoto, Japan, November 12-16, 2012. Proceedings 14.   Springer, 2012, pp. 316–331.
  39. J. Wang, S. Wang, and Q. Wang, “Is there a ”golden” feature set for static warning identification? an experimental evaluation,” in Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ser. ESEM ’18.   New York, NY, USA: Association for Computing Machinery, 2018.
  40. A. Arcuri, “On the automation of fixing software bugs,” in Companion of the 30th International Conference on Software Engineering, ser. ICSE Companion ’08.   New York, NY, USA: Association for Computing Machinery, 2008, p. 1003–1006. [Online]. Available: https://doi.org/10.1145/1370175.1370223
  41. Z. Fan, X. Gao, A. Roychoudhury, and S. H. Tan, “Improving automatically generated code from codex via automated program repair,” arXiv preprint arXiv:2205.10583, 2022.
  42. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest, “Automatically finding patches using genetic programming,” in 2009 IEEE 31st International Conference on Software Engineering.   IEEE, 2009, pp. 364–374.
  43. A. Nilizadeh, G. T. Leavens, X.-B. D. Le, C. S. Păsăreanu, and D. R. Cok, “Exploring true test overfitting in dynamic automated program repair using formal methods,” in 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), 2021, pp. 229–240.
  44. M. Fu, C. Tantithamthavorn, T. Le, V. Nguyen, and D. Phung, “Vulrepair: A t5-based automated software vulnerability repair,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2022.   New York, NY, USA: Association for Computing Machinery, 2022, p. 935–947. [Online]. Available: https://doi.org/10.1145/3540250.3549098
  45. J. Berdine, A. Cox, S. Ishtiaq, and C. M. Wintersteiger, “Diagnosing abstraction failure for separation logic–based analyses,” in Computer Aided Verification, P. Madhusudan and S. A. Seshia, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 155–173.
  46. Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.   Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 8696–8708.
  47. J. Fan, Y. Li, S. Wang, and T. N. Nguyen, “A c/c++ code vulnerability dataset with code changes and cve summaries,” in Proceedings of the 17th International Conference on Mining Software Repositories, ser. MSR ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 508–512.
  48. G. Bhandari, A. Naseer, and L. Moonen, “Cvefixes: Automated collection of vulnerabilities and their fixes from open-source software,” in Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, ser. PROMISE 2021.   New York, NY, USA: Association for Computing Machinery, 2021, p. 30–39.
  49. Z. Zeng, H. Tan, H. Zhang, J. Li, Y. Zhang, and L. Zhang, “An extensive study on pre-trained models for program understanding and generation,” in Proceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis, 2022, pp. 39–51.
  50. OpenAI. (2023) Chatgpt. Accessed on Date. [Online]. Available: https://openai.com/blog/chatgpt
  51. Q. Guo and et al., “Exploring the potential of chatgpt in automated code refinement: An empirical study,” arXiv, 2023, accessed 19 Oct. 2023. [Online]. Available: https://arxiv.org/abs/2309.08221
  52. OpenAI. (2023) Chatgpt-3.5. Accessed on Date. [Online]. Available: https://platform.openai.com/docs/models/gpt-3-5
  53. “Gpt-4 technical report,” ArXiv, 2023, accessed 17 Oct. 2023. [Online]. Available: https://arxiv.org/abs/2303.08774
  54. S. Kang, J. Yoon, and S. Yoo, “Large language models are few-shot testers: Exploring llm-based general bug reproduction,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).   IEEE, 2023, pp. 2312–2323.
  55. D. Zan, B. Chen, F. Zhang, D. Lu, B. Wu, B. Guan, W. Yongji, and J.-G. Lou, “Large language models meet NL2Code: A survey,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).   Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 7443–7464.
  56. B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, and D. Roth, “Recent advances in natural language processing via large pre-trained language models: A survey,” ACM Comput. Surv., vol. 56, no. 2, sep 2023.
  57. J. Wei and et al., “Chain-of-thought prompting elicits reasoning in large language models,” arXiv, 2022, accessed 19 Oct. 2023.
  58. T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35.   Curran Associates, Inc., 2022, pp. 22 199–22 213.
  59. Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” arXiv preprint arXiv:2109.00859, 2021.
  60. T.-T. Wong and P.-Y. Yeh, “Reliable accuracy estimates from k-fold cross validation,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 8, pp. 1586–1594, 2020.
  61. H. Joshi, J. Cambronero Sanchez, S. Gulwani, V. Le, G. Verbruggen, and I. Radiček, “Repair is nearly generation: Multilingual program repair with llms,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 4, pp. 5131–5140, Jun. 2023.
  62. M. Jin, S. Shahriar, M. Tufano, X. Shi, S. Lu, N. Sundaresan, and A. Svyatkovskiy, “Inferfix: End-to-end program repair with llms,” arXiv preprint arXiv:2303.07263, 2023.
  63. Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, and S. H. Tan, “Automated repair of programs from large language models,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023, pp. 1469–1481.
  64. F. Ribeiro, “Large language models for automated program repair,” in Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity, ser. SPLASH 2023.   New York, NY, USA: Association for Computing Machinery, 2023, p. 7–9.
  65. N. Jiang, K. Liu, T. Lutellier, and L. Tan, “Impact of code language models on automated program repair,” in Proceedings of the 45th International Conference on Software Engineering, ser. ICSE ’23.   IEEE Press, 2023, p. 1430–1442.
  66. C. S. Xia and L. Zhang, “Less training, more repairing please: Revisiting automated program repair via zero-shot learning,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2022.   New York, NY, USA: Association for Computing Machinery, 2022, p. 959–971. [Online]. Available: https://doi.org/10.1145/3540250.3549101
  67. E. Mashhadi and H. Hemmati, “Applying codebert for automated program repair of java simple bugs,” in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 2021, pp. 505–509.
  68. C. Lemieux, J. P. Inala, S. K. Lahiri, and S. Sen, “Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models,” in Proceedings of the 45th International Conference on Software Engineering, ser. ICSE ’23.   IEEE Press, 2023, p. 919–931. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00085
  69. Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, and L. Zhang, “Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt,” arXiv preprint arXiv:2304.02014, 2023.
  70. Y. Deng, C. S. Xia, H. Peng, C. Yang, and L. Zhang, “Fuzzing deep-learning libraries via large language models,” arXiv preprint arXiv:2212.14834, 2022.
  71. M. Tufano, D. Drain, A. Svyatkovskiy, and N. Sundaresan, “Generating accurate assert statements for unit test cases using pretrained transformers,” in Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test, ser. AST ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 54–64. [Online]. Available: https://doi.org/10.1145/3524481.3527220
  72. P. Nie, R. Banerjee, J. J. Li, R. J. Mooney, and M. Gligoric, “Learning deep semantics for test completion,” arXiv preprint arXiv:2302.10166, 2023.
  73. E. Dinella, G. Ryan, T. Mytkowicz, and S. K. Lahiri, “Toga: A neural method for test oracle generation,” in Proceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 2130–2141. [Online]. Available: https://doi.org/10.1145/3510003.3510141
  74. M. Taeb, A. Swearngin, E. School, R. Cheng, Y. Jiang, and J. Nichols, “Axnav: Replaying accessibility tests from natural language,” arXiv preprint arXiv:2310.02424, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Mohammad Mahdi Mohajer (6 papers)
  2. Reem Aleithan (4 papers)
  3. Nima Shiri Harzevili (10 papers)
  4. Moshi Wei (10 papers)
  5. Alvine Boaye Belle (10 papers)
  6. Hung Viet Pham (14 papers)
  7. Song Wang (313 papers)
Citations (2)