Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-trained Model-based Actionable Warning Identification: A Feasibility Study (2403.02716v1)

Published 5 Mar 2024 in cs.SE

Abstract: Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develop a classifier. Very recently, Pre-Trained Models (PTMs), which have been trained through billions of text/code tokens and demonstrated substantial success applications on various code-related tasks, could potentially circumvent the above problem. Nevertheless, the performance of PTMs on AWI has not been systematically investigated, leaving a gap in understanding their pros and cons. In this paper, we are the first to explore the feasibility of applying various PTMs for AWI. By conducting the extensive evaluation on 10K+ SpotBugs warnings from 10 large-scale and open-source projects, we observe that all studied PTMs are consistently 9.85%~21.12% better than the state-of-the-art ML-based AWI approaches. Besides, we investigate the impact of three primary aspects (i.e., data preprocessing, model training, and model prediction) in the typical PTM-based AWI workflow. Further, we identify the reasons for current PTMs' underperformance on AWI. Based on our findings, we provide several practical guidelines to enhance PTM-based AWI in future work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. B. Chess and G. McGraw, “Static analysis for security,” IEEE Symposium on Security and Privacy (S&P), vol. 2, pp. 76–79, 2004.
  2. A. Habib and M. Pradel, “How many of all bugs do we find? a study of static bug detectors,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE), 2018, pp. 317–328.
  3. S. Lipp, S. Banescu, and A. Pretschner, “An empirical study on the effectiveness of static c code analyzers for vulnerability detection,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2022, pp. 1–13.
  4. X. Ge, C. Fang, T. Bai, J. Liu, and Z. Zhao, “An empirical study of class rebalancing methods for actionable warning identification,” IEEE Transactions on Reliability (TR), vol. 72, no. 4, pp. 1–15, 2023.
  5. G. Xiuting, F. Chunrong, L. Jia, Q. Mingshuang, L. Xuanye, and Z. Zhihong, “An unsupervised feature selection approach for actionable warning identification,” Expert Systems with Applications (ESWA), vol. 227, p. 120152, 2023.
  6. R. Yedida, H. J. Kang, H. Tu, X. Yang, D. Lo, and T. Menzies, “How to find actionable static analysis warnings: A case study with findbugs,” IEEE Transactions on Software Engineering (TSE), pp. 1–17, 2023.
  7. H. J. Kang, K. L. Aw, and D. Lo, “Detecting false alarms from automatic static analysis tools: how far are we?” in Proceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE), 2022, pp. 698–709.
  8. E. S. Andreasen, A. Møller, and B. B. Nielsen, “Systematic approaches for increasing soundness and precision of static analyzers,” in Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis, 2017, pp. 31–36.
  9. H. G. Rice, “Classes of recursively enumerable sets and their decision problems,” Journal of Symbolic Logic, vol. 74, no. 2, pp. 358–366, 1953.
  10. A. Kharkar, R. Z. Moghaddam, M. Jin, X. Liu, X. Shi, C. Clement, and N. Sundaresan, “Learning to reduce false positives in analytic bug detectors,” in Proceedings of the 44th International Conference on Software Engineering (ICSE), 2022, pp. 1307–1316.
  11. P. Bielik, V. Raychev, and M. Vechev, “Learning a static analyzer from data,” in Proceedings of the 29th International Conference of Computer Aided Verification (CAV), 2017, pp. 233–253.
  12. T. Muske and A. Serebrenik, “Survey of approaches for postprocessing of static analysis alarms,” ACM Computing Survey (CSUR), vol. 55, no. 3, 2022.
  13. X. Ge, C. Fang, X. Li, W. Sun, D. Wu, J. Zhai, S. Lin, Z. Zhao, Y. Liu, and Z. Chen, “Machine learning for actionable warning identification: A comprehensive survey,” arXiv preprint arXiv:2312.00324, 2023.
  14. U. Koc, S. Wei, J. S. Foster, M. Carpuat, and A. A. Porter, “An empirical assessment of machine learning approaches for triaging reports of a java static analysis tool,” in Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST), 2019, pp. 288–299.
  15. S. Yerramreddy, A. Mordahl, U. Koc, S. Wei, J. S. Foster, M. Carpuat, and A. A. Porter, “An empirical assessment of machine learning approaches for triaging reports of static analysis tools,” Empirical Software Engineering (EMSE), vol. 28, no. 2, p. 28, 2023.
  16. S. Lee, S. Hong, J. Yi, T. Kim, C.-J. Kim, and S. Yoo, “Classifying false positive static checker alarms in continuous integration using convolutional neural networks,” in Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST), 2019, pp. 391–401.
  17. U. Koc, P. Saadatpanah, J. S. Foster, and A. A. Porter, “Learning a classifier for false positive error reports emitted by static code analysis tools,” in Proceedings of the 1st ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL), 2017, pp. 35–42.
  18. K. T. Tran and H. D. Vo, “Scar: smart contract alarm ranking,” in Proceedings of the 29th Asia-Pacific Software Engineering Conference (APSEC), 2022, pp. 447–451.
  19. X. Han, Z. Zhang, N. Ding, Y. Gu, X. Liu, Y. Huo, J. Qiu, Y. Yao, A. Zhang, L. Zhang et al., “Pre-trained models: Past, present and future,” AI Open, vol. 2, pp. 225–250, 2021.
  20. Q. Zhang, C. Fang, B. Yu, W. Sun, T. Zhang, and Z. Chen, “Pre-trained model-based automated software vulnerability repair: How far are we?” IEEE Transactions on Dependable and Secure Computing (TDSC), pp. 1–18, 2023.
  21. R. package, https://sites.google.com/view/ptm4awidata.
  22. P. Yu, Y. Wu, X. Peng, H. Peng, J. Zhang, P. Xie, and W. Zhao, “Violationtracker: Building precise histories for static analysis violations,” in Proceedings of the 45th International Conference on Software Engineering (ICSE), 2023, pp. 1–12.
  23. K. Liu, D. Kim, T. F. Bissyandé, S. Yoo, and Y. Le Traon, “Mining fix patterns for findbugs violations,” IEEE Transactions on Software Engineering (TSE), vol. 47, no. 1, pp. 165–188, 2021.
  24. J. Wang, S. Wang, and Q. Wang, “Is there a “golden” feature set for static warning identification? an experimental evaluation,” in Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2018, pp. 1–10.
  25. J. Wang, Y. Huang, S. Wang, and Q. Wang, “Find bugs in static bug finders,” in Proceedings of the 30th International Conference on Program Comprehension (ICPC), 2022, pp. 516–527.
  26. F. F. Xu, U. Alon, G. Neubig, and V. J. Hellendoorn, “A systematic evaluation of large language models of code,” in Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS), 2022, p. 1–10.
  27. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  28. A. P. Bradley, “The use of the area under the roc curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, no. 7, pp. 1145–1159, 1997.
  29. X. Ge, Y. Huang, Z. Hui, X. Wang, and X. Cao, “Impact of datasets on machine learning based methods in android malware detection: an empirical study,” in Proceedings of the 21st International Conference on Software Quality, Reliability and Security (QRS), 2021, pp. 81–92.
  30. X. Yang, Z. Yu, J. Wang, and T. Menzies, “Understanding static code warnings: An incremental ai approach,” Expert System with Applications (ESWA), vol. 167, p. 114134, 2021.
  31. X. Yang, J. Chen, R. Yedida, Z. Yu, and T. Menzies, “Learning to recognize actionable static code warnings (is intrinsically easy),” Empirical Software Engineering (EMSE), vol. 26, no. 3, 2021.
  32. M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
  33. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems (NIPS), vol. 33, pp. 1877–1901, 2020.
  34. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
  35. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
  36. C. Niu, C. Li, V. Ng, D. Chen, J. Ge, and B. Luo, “An empirical comparison of pre-trained models of source code,” in Proceedings of the 45th International Conference on Software Engineering (ICSE), 2023, pp. 2136–2148.
  37. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang et al., “Codebert: A pre-trained model for programming and natural languages,” Findings of the Association for Computational Linguistics (ACL), 2020.
  38. D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fu et al., “Graphcodebert: Pre-training code representations with data flow,” arXiv preprint arXiv:2009.08366, 2020.
  39. Y. Wang, W. Wang, S. Joty, and S. C. Hoi, “Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” in Proceedings of the 26th International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
  40. D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, “Unixcoder: Unified cross-modal pre-training for code representation,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
  41. S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang et al., “Codexglue: A machine learning benchmark dataset for code understanding and generation,” arXiv preprint arXiv:2102.04664, 2021.
  42. G. G. Cabral, L. L. Minku, E. Shihab, and S. Mujahid, “Class imbalance evolution and verification latency in just-in-time software defect prediction,” in Proceedings of the 41st IEEE/ACM International Conference on Software Engineering (ICSE), 2019, pp. 666–676.
  43. S. Kim and M. D. Ernst, “Prioritizing warning categories by analyzing software history,” in Proceedings of the 4th International Workshop on Mining Software Repositories (MSR), 2007, pp. 27–27.
  44. Y. Liu, Y. Gao, and W. Yin, “An improved analysis of stochastic gradient descent with momentum,” in Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS), 2020.
  45. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2015, pp. 1–15.
  46. Z. Chen, S. Kommrusch, and M. Monperrus, “Neural transfer learning for repairing security vulnerabilities in c code,” IEEE Transactions on Software Engineering (TSE), 2022.
  47. S. Kim, S. Woo, H. Lee, and H. Oh, “Vuddy: A scalable approach for vulnerable code clone discovery,” in 2017 IEEE Symposium on Security and Privacy (S&P), 2017, pp. 595–614.
  48. B. Krawczyk, “Learning from imbalanced data: open challenges and future directions,” Progress in Artificial Intelligence, vol. 5, no. 4, pp. 221–232, 2016.
  49. T. Muske, R. Talluri, and A. Serebrenik, “Repositioning of static analysis alarms,” in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2018, pp. 187–197.
  50. A. Kallingal Joshy, X. Chen, B. Steenhoek, and W. Le, “Validating static warnings via testing code fragments,” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2021, pp. 540–552.
  51. B. Rozière, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov, J. Bitton, M. Bhatt, C. Ferrer, A. Grattafiori, W. Xiong, A. Defossez, J. Copet, and G. Synnaeve, “Code llama: Open foundation models for code,” in arXiv:2308.12950, 2023.
  52. J. Li, “A better approach to track the evolution of static code warnings,” in Proceedings of the 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2021, pp. 135–137.
  53. D. Marcilio, C. A. Furia, R. Bonifácio, and G. Pinto, “Spongebugs: Automatically generating fix suggestions in response to static code analysis warnings,” Journal of Systems and Software (JSS), vol. 168, p. 110671, 2020.
  54. J. Xu, Y. Li, and R. H. Deng, “Differential training: A generic framework to reduce label noises for android malware detection,” in Proceedings of the 28th Network and Distributed Systems Security Symposium (NDSS), 2021, pp. 1–14.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Xiuting Ge (5 papers)
  2. Chunrong Fang (71 papers)
  3. Quanjun Zhang (36 papers)
  4. Daoyuan Wu (39 papers)
  5. Bowen Yu (89 papers)
  6. Qirui Zheng (4 papers)
  7. An Guo (9 papers)
  8. Shangwei Lin (2 papers)
  9. Zhihong Zhao (7 papers)
  10. Yang Liu (2253 papers)
  11. Zhenyu Chen (91 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com