Automatic Static Bug Detection for Machine Learning Libraries: Are We There Yet? (2307.04080v1)
Abstract: Automatic detection of software bugs is a critical task in software security. Many static tools that can help detect bugs have been proposed. While these static bug detectors are mainly evaluated on general software projects call into question their practical effectiveness and usefulness for machine learning libraries. In this paper, we address this question by analyzing five popular and widely used static bug detectors, i.e., Flawfinder, RATS, Cppcheck, Facebook Infer, and Clang static analyzer on a curated dataset of software bugs gathered from four popular machine learning libraries including Mlpack, MXNet, PyTorch, and TensorFlow with a total of 410 known bugs. Our research provides a categorization of these tools' capabilities to better understand the strengths and weaknesses of the tools for detecting software bugs in machine learning libraries. Overall, our study shows that static bug detectors find a negligible amount of all bugs accounting for 6/410 bugs (0.01%), Flawfinder and RATS are the most effective static checker for finding software bugs in machine learning libraries. Based on our observations, we further identify and discuss opportunities to make the tools more effective and practical.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
- X. Cheng, H. Wang, J. Hua, G. Xu, and Y. Sui, “Deepwukong: Statically detecting software vulnerabilities using deep graph neural network,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 30, no. 3, pp. 1–33, 2021.
- S. Cao, X. Sun, L. Bo, R. Wu, B. Li, and C. Tao, “Mvd: Memory-related vulnerability detection based on flow-sensitive graph neural networks,” arXiv preprint arXiv:2203.02660, 2022.
- Z. Li, D. Zou, S. Xu, Z. Chen, Y. Zhu, and H. Jin, “Vuldeelocator: a deep learning-based fine-grained vulnerability detector,” IEEE Transactions on Dependable and Secure Computing, 2021.
- A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler, “A few billion lines of code later: Using static analysis to find bugs in the real world,” Commun. ACM, vol. 53, no. 2, p. 66–75, feb 2010. [Online]. Available: https://doi.org/10.1145/1646353.1646374
- G. inc. Errorprone. [Online]. Available: https://errorprone.info/
- Facebook. (2013) Infer. [Online]. Available: https://fbinfer.com/
- SpotBugs. (2021) Spotbugs. [Online]. Available: .https://spotbugs.github.io/
- N. Ayewah, W. Pugh, D. Hovemeyer, J. D. Morgenthaler, and J. Penix, “Using static analysis to find bugs,” IEEE software, vol. 25, no. 5, pp. 22–29, 2008.
- B. Johnson, Y. Song, E. Murphy-Hill, and R. Bowdidge, “Why don’t software developers use static analysis tools to find bugs?” in 2013 35th International Conference on Software Engineering (ICSE). IEEE, 2013, pp. 672–681.
- F. Thung, D. Lo, L. Jiang, F. Rahman, P. T. Devanbu et al., “To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools,” in 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 2012, pp. 50–59.
- ——, “To what extent could we detect field defects? an extended empirical study of false negatives in static bug-finding tools,” Automated Software Engineering, vol. 22, no. 4, pp. 561–602, 2015.
- A. Habib and M. Pradel, “How many of all bugs do we find? a study of static bug detectors,” in 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2018, pp. 317–328.
- D. A. Tomassi and C. Rubio-González, “On the real-world effectiveness of static bug detectors at finding null pointer exceptions,” in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 292–303.
- S. Lipp, S. Banescu, and A. Pretschner, “An empirical study on the effectiveness of static c code analyzers for vulnerability detection,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 544–555.
- G. Algan and I. Ulusoy, “Image classification with deep learning in the presence of noisy labels: A survey,” Knowledge-Based Systems, vol. 215, p. 106771, 2021.
- F. Mahdisoltani, G. Berger, W. Gharbieh, D. Fleet, and R. Memisevic, “Fine-grained video classification and captioning,” arXiv preprint arXiv:1804.09235, vol. 5, no. 6, 2018.
- R. Patgiri, “A taxonomy on big data: Survey,” arXiv preprint arXiv:1808.08474, 2018.
- Y. Lv, B. Liu, J. Zhang, Y. Dai, A. Li, and T. Zhang, “Semi-supervised active salient object detection,” Pattern Recognition, vol. 123, p. 108364, 2022.
- R. Simhambhatla, K. Okiah, S. Kuchkula, and R. Slater, “Self-driving cars: Evaluation of deep learning techniques for object detection in different driving conditions,” SMU Data Science Review, vol. 2, no. 1, p. 23, 2019.
- S. Ramos, S. Gehrig, P. Pinggera, U. Franke, and C. Rother, “Detecting unexpected obstacles for self-driving cars: Fusing deep learning and geometric modeling,” in 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2017, pp. 1025–1032.
- R. Kulkarni, S. Dhavalikar, and S. Bangar, “Traffic light detection and recognition for self driving cars using deep learning,” in 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). IEEE, 2018, pp. 1–4.
- S. Minaee and Z. Liu, “Automatic question-answering using a deep similarity neural network,” in 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2017, pp. 923–927.
- R. G. Athreya, S. K. Bansal, A.-C. N. Ngomo, and R. Usbeck, “Template-based question answering using recursive neural networks,” in 2021 IEEE 15th International Conference on Semantic Computing (ICSC). IEEE, 2021, pp. 195–198.
- P. K. Roy, “Deep neural network to predict answer votes on community question answering sites,” Neural Processing Letters, vol. 53, no. 2, pp. 1633–1646, 2021.
- J.-W. Hong, Y. Wang, and P. Lanz, “Why is artificial intelligence blamed more? analysis of faulting artificial intelligence for self-driving car accidents in experimental settings,” International Journal of Human–Computer Interaction, vol. 36, no. 18, pp. 1768–1774, 2020.
- D. A. Wheeler. (2013) Dlawfinder. [Online]. Available: http://dwheeler.com/flawfinder/
- J. Chen, C. Zhang, S. Cai, L. Zhang, and L. Ma, “A memory-related vulnerability detection approach based on vulnerability model with petri net,” Journal of Logical and Algebraic Methods in Programming, vol. 132, p. 100859, 2023.
- J. D. Pereira and M. Vieira, “On the use of open-source c/c++ static analysis tools in large projects,” in 2020 16th European Dependable Computing Conference (EDCC). IEEE, 2020, pp. 97–102.
- C. Mitropoulos, “Employing different program analysis methods to study bug evolution,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 1202–1204.
- M. Gu, H. Feng, H. Sun, P. Liu, Q. Yue, J. Hu, C. Cao, and Y. Zhang, “Hierarchical attention network for interpretable and fine-grained vulnerability detection,” in IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2022, pp. 1–6.
- D. Zou, Y. Hu, W. Li, Y. Wu, H. Zhao, and H. Jin, “mvulpreter: A multi-granularity vulnerability detection system with interpretations,” IEEE Transactions on Dependable and Secure Computing, 2022.
- Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” arXiv preprint arXiv:1909.03496, 2019.
- A. Dunham. (2009) rough-auditing-tool-for-security. [Online]. Available: https://github.com/andrew-d/rough-auditing-tool-for-security
- D. Marjamäki. (2016) Cppcheck. [Online]. Available: https://cppcheck.sourceforge.io/
- S. Pujar, Y. Zheng, L. Buratti, B. Lewis, A. Morari, J. Laredo, K. Postlethwait, and C. Görn, “Varangian: a git bot for augmented static analysis,” in Proceedings of the 19th International Conference on Mining Software Repositories, 2022, pp. 766–767.
- S. Mehrpour and T. D. LaToza, “Can static analysis tools find more defects? a qualitative study of design rule violations found by code review,” Empirical Software Engineering, vol. 28, no. 1, p. 5, 2023.
- C. Lattner, “Llvm and clang: Next generation compiler technology,” in The BSD conference, vol. 5, 2008, pp. 1–20.
- K. Umann and Z. Porkoláb, “Detecting uninitialized variables in c++ with the clang static analyzer,” Acta Cybernetica, vol. 25, no. 4, pp. 923–940, 2022.
- P. G. Szécsi, G. Horváth, and Z. Porkoláb, “Improved loop execution modeling in the clang static analyzer,” Acta Cybernetica, vol. 25, no. 4, pp. 909–921, 2022.
- H. Aslanyan, Z. Gevorgyan, R. Mkoyan, H. Movsisyan, V. Sahakyan, and S. Sargsyan, “Static analysis methods for memory leak detection: A survey,” in 2022 Ivannikov Memorial Workshop (IVMEM). IEEE, 2022, pp. 1–6.
- M. Pradel and T. R. Gross, “Detecting anomalies in the order of equally-typed method arguments,” in Proceedings of the 2011 International Symposium on Software Testing and Analysis, 2011, pp. 232–242.
- N. S. Harzevili, J. Shin, J. Wang, and S. Wang, “Characterizing and understanding software security vulnerabilities in machine learning libraries,” arXiv preprint arXiv:2203.06502, 2022.
- F. Thung, S. Wang, D. Lo, and L. Jiang, “An empirical study of bugs in machine learning systems,” in 2012 IEEE 23rd International Symposium on Software Reliability Engineering. IEEE, 2012, pp. 271–280.
- M. J. Islam, G. Nguyen, R. Pan, and H. Rajan, “A comprehensive study on deep learning bug characteristics,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 510–520.
- L. Jia, H. Zhong, X. Wang, L. Huang, and X. Lu, “The symptoms, causes, and repairs of bugs inside a deep learning library,” Journal of Systems and Software, vol. 177, p. 110935, 2021.
- Y. Zhou and A. Sharma, “Automated identification of security issues from commit messages and bug reports,” in Proceedings of the 2017 11th joint meeting on foundations of software engineering, 2017, pp. 914–919.
- Y. Younan, W. Joosen, and F. Piessens, “Runtime countermeasures for code injection attacks against c and c++ programs,” ACM Computing Surveys (CSUR), vol. 44, no. 3, pp. 1–28, 2012.
- Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang, Z. Deng, and Y. Zhong, “Vuldeepecker: A deep learning-based system for vulnerability detection,” arXiv preprint arXiv:1801.01681, 2018.
- X. Duan, J. Wu, S. Ji, Z. Rui, T. Luo, M. Yang, and Y. Wu, “Vulsniper: Focus your attention to shoot fine-grained vulnerabilities.” in IJCAI, 2019, pp. 4665–4671.
- Q. Xiao, K. Li, D. Zhang, and W. Xu, “Security risks in deep learning implementations,” in 2018 IEEE Security and privacy workshops (SPW). IEEE, 2018, pp. 123–128.
- A. Di Franco, H. Guo, and C. Rubio-González, “A comprehensive study of real-world numerical bug characteristics,” in 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2017, pp. 509–519.
- Q. Shen, H. Ma, J. Chen, Y. Tian, S.-C. Cheung, and X. Chen, “A comprehensive study of deep learning compiler bugs,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 968–980.
- A. Hindle, E. T. Barr, M. Gabel, Z. Su, and P. Devanbu, “On the naturalness of software,” Commun. ACM, vol. 59, no. 5, p. 122–131, apr 2016. [Online]. Available: https://doi.org/10.1145/2902362
- F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and discovering vulnerabilities with code property graphs,” in 2014 IEEE Symposium on Security and Privacy. IEEE, 2014, pp. 590–604.
- Z. Li, D. Zou, S. Xu, H. Jin, Y. Zhu, and Z. Chen, “Sysevr: A framework for using deep learning to detect software vulnerabilities,” IEEE Transactions on Dependable and Secure Computing, 2021.
- D. A. Tomassi, “Bugs in the wild: examining the effectiveness of static analyzers at finding real-world bugs,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 980–982.
- N. Rutar, C. B. Almazan, and J. S. Foster, “A comparison of bug finding tools for java,” in 15th International symposium on software reliability engineering. IEEE, 2004, pp. 245–256.
- G. Chatzieleftheriou and P. Katsaros, “Test-driving static analysis tools in search of c code vulnerabilities,” in 2011 IEEE 35th annual computer software and applications conference workshops. IEEE, 2011, pp. 96–103.
- A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler, “A few billion lines of code later: using static analysis to find bugs in the real world,” Communications of the ACM, vol. 53, no. 2, pp. 66–75, 2010.
- N. Ayewah and W. Pugh, “The google findbugs fixit,” in Proceedings of the 19th international symposium on Software testing and analysis, 2010, pp. 241–252.
- A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, “An empirical study of operating systems errors,” in Proceedings of the eighteenth ACM symposium on Operating systems principles, 2001, pp. 73–88.
- S. Lu, S. Park, E. Seo, and Y. Zhou, “Learning from mistakes: a comprehensive study on real world concurrency bug characteristics,” in Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, 2008, pp. 329–339.
- F. Ocariza, K. Bajaj, K. Pattabiraman, and A. Mesbah, “An empirical study of client-side javascript bugs,” in 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 2013, pp. 55–64.
- M. Selakovic and M. Pradel, “Performance issues and optimizations in javascript: an empirical study,” in Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 61–72.
- K. Pan, S. Kim, and E. J. Whitehead, “Toward an understanding of bug fix patterns,” Empirical Software Engineering, vol. 14, no. 3, pp. 286–315, 2009.
- Jiho Shin (10 papers)
- Junjie Wang (164 papers)
- Song Wang (313 papers)
- Nachiappan Nagappan (30 papers)
- Nima Shiri Harzevili (10 papers)