Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Tale of Two DL Cities: When Library Tests Meet Compiler (2407.16626v2)

Published 23 Jul 2024 in cs.SE

Abstract: Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation.Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common objective with DL library testing, indicating that the embedded knowledge in DL library tests is beneficial for testing the model loading stage of DL compilers. In this work, we propose OPERA to extract such domain knowledge from the test inputs for DL libraries. OPERA constructs diverse tests from the various test inputs for DL libraries (including the test inputs documented in DL libraries and those generated by recent fuzzers). In addition, it incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs that are more likely to detect diverse bugs earlier. We considered three sources of tests in DL libraries for migration and used eight frontends from three DL compilers (e.g., TVM, TensorRT, and OpenVINO) for evaluation. OPERA detected 170 previously unknown bugs in total, 90 of which have been confirmed/fixed by developers, demonstrating the effectiveness of such the migration-based idea. The test prioritization strategy in OPERA improves testing efficiency with migrated tests by 11.9%~47.4% on average compared to general test prioritization strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, “Tvm: An automated end-to-end optimizing compiler for deep learning,” in Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI’18.   USA: USENIX Association, 2018, p. 579–594.
  2. “Nvidia tensorrt,” Accessed: 2024, https://developer.nvidia.com/tensorrt.
  3. “Intel openvino,” Accessed: 2024, https://docs.openvino.ai/2022.3/home.html.
  4. Q. Shen, H. Ma, J. Chen, Y. Tian, S.-C. Cheung, and X. Chen, “A comprehensive study of deep learning compiler bugs,” in Proceedings of the 29th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2021, pp. 968–980.
  5. “Pytorch,” Accessed: 2024, https://pytorch.org/.
  6. “Keras,” Accessed: 2024, https://keras.io/.
  7. C. Sun, V. Le, Q. Zhang, and Z. Su, “Toward understanding compiler bugs in gcc and llvm,” in Proceedings of the 25th International Symposium on Software Testing and Analysis, ser. ISSTA 2016.   New York, NY, USA: Association for Computing Machinery, 2016, p. 294–305. [Online]. Available: https://doi.org/10.1145/2931037.2931074
  8. Z. Zhou, Z. Ren, G. Gao, and H. Jiang, “An empirical study of optimization bugs in gcc and llvm,” Journal of Systems and Software, vol. 174, p. 110884, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0164121220302740
  9. S. Chaliasos, T. Sotiropoulos, G.-P. Drosos, C. Mitropoulos, D. Mitropoulos, and D. Spinellis, “Well-typed programs can go wrong: A study of typing-related bugs in jvm compilers,” Proc. ACM Program. Lang., vol. 5, no. OOPSLA, Oct 2021. [Online]. Available: https://doi.org/10.1145/3485500
  10. Z. Wang, D. Bu, A. Sun, S. Gou, Y. Wang, and L. Chen, “An empirical study on bugs in python interpreters,” IEEE Transactions on Reliability, vol. 71, no. 2, pp. 716–734, 2022.
  11. H. Ma, Q. Shen, Y. Tian, J. Chen, and S.-C. Cheung, “Fuzzing deep learning compilers with hirgen,” in Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023.   New York, NY, USA: Association for Computing Machinery, 2023, p. 248–260. [Online]. Available: https://doi.org/10.1145/3597926.3598053
  12. J. Liu, Y. Wei, S. Yang, Y. Deng, and L. Zhang, “Coverage-guided tensor compiler fuzzing with joint ir-pass mutation,” Proceedings of the ACM on Programming Languages, vol. 6, no. OOPSLA1, pp. 1–26, 2022.
  13. “Tvmfuzz,” Accessed: 2024, https://github.com/dpankratz/TVMFuzz.
  14. J. Liu, J. Lin, F. Ruffy, C. Tan, J. Li, A. Panda, and L. Zhang, “Nnsmith: Generating diverse and valid test cases for deep learning compilers,” ser. ASPLOS 2023.   New York, NY, USA: Association for Computing Machinery, 2023, p. 530–543. [Online]. Available: https://doi.org/10.1145/3575693.3575707
  15. D. Xie, Y. Li, M. Kim, H. V. Pham, L. Tan, X. Zhang, and M. W. Godfrey, “Docter: documentation-guided fuzzing for testing deep learning api functions,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 176–188.
  16. Y. Deng, C. Yang, A. Wei, and L. Zhang, “Fuzzing deep-learning libraries via automated relational api inference,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 44–56.
  17. A. Mor, “Evaluate the effectiveness of test suite prioritization techniques using apfd metric,” IOSR Journal of Computer, vol. 16, no. 4, pp. 47–51, 2014.
  18. “An incorreect conversion about conv2dtranspose in tvm,” Accessed: 2024, https://github.com/apache/tvm/pull/15060.
  19. Z. Wang, M. Yan, J. Chen, S. Liu, and D. Zhang, “Deep learning library testing via effective model generation,” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 788–799.
  20. A. Bhat and S. Quadri, “Equivalence class partitioning and boundary value analysis-a review,” in 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).   IEEE, 2015, pp. 1557–1562.
  21. M. Li, J. Cao, Y. Tian, T. O. Li, M. Wen, and S.-C. Cheung, “Comet: Coverage-guided model generation for deep learning library testing,” ACM Trans. Softw. Eng. Methodol., vol. 32, no. 5, jul 2023. [Online]. Available: https://doi.org/10.1145/3583566
  22. Y. Panagakis, J. Kossaifi, G. G. Chrysos, J. Oldfield, M. A. Nicolaou, A. Anandkumar, and S. Zafeiriou, “Tensor methods in computer vision and deep learning,” Proceedings of the IEEE, vol. 109, no. 5, pp. 863–890, 2021.
  23. C. Nie and H. Leung, “A survey of combinatorial testing,” ACM Comput. Surv., vol. 43, no. 2, feb 2011. [Online]. Available: https://doi.org/10.1145/1883612.1883618
  24. R. Schaffer and R. Sedgewick, “The analysis of heapsort,” Journal of Algorithms, vol. 15, no. 1, pp. 76–100, 1993.
  25. X. Yang, Y. Chen, E. Eide, and J. Regehr, “Finding and understanding bugs in c compilers,” in Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’11.   New York, NY, USA: Association for Computing Machinery, 2011, p. 283–294. [Online]. Available: https://doi.org/10.1145/1993498.1993532
  26. R. Coghetto, “Chebyshev distance,” Formalized Mathematics, vol. 24, no. 2, pp. 121–141, 2016.
  27. D. Xiao, Z. Liu, Y. Yuan, Q. Pang, and S. Wang, “Metamorphic testing of deep learning compilers,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 6, no. 1, pp. 1–28, 2022.
  28. J. Chen, Y. Liang, Q. Shen, J. Jiang, and S. Li, “Toward understanding deep learning framework bugs,” ACM Transactions on Software Engineering and Methodology, 2022.
  29. Q. Guo, X. Xie, Y. Li, X. Zhang, Y. Liu, X. Li, and C. Shen, “Audee: Automated testing for deep learning frameworks,” in Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, 2020, pp. 486–498.
  30. B. Miranda, E. Cruciani, R. Verdecchia, and A. Bertolino, “Fast approaches to scalable similarity-based test case prioritization,” in Proceedings of the 40th International Conference on Software Engineering, 2018, pp. 222–232.
  31. O. Jafari, P. Maurya, P. Nagarkar, K. M. Islam, and C. Crushev, “A survey on locality sensitive hashing algorithms and their applications,” arXiv preprint arXiv:2102.08942, 2021.
  32. G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold, “Prioritizing test cases for regression testing,” IEEE Transactions on software engineering, vol. 27, no. 10, pp. 929–948, 2001.
  33. J. Zhou, J. Chen, and D. Hao, “Parallel test prioritization,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 31, no. 1, pp. 1–50, 2021.
  34. Z. Chen, J. Chen, W. Wang, J. Zhou, M. Wang, X. Chen, S. Zhou, and J. Wang, “Exploring better black-box test case prioritization via log analysis,” ACM Transactions on Software Engineering and Methodology, vol. 32, no. 3, pp. 1–32, 2023.
  35. J. Chen, Y. Lou, L. Zhang, J. Zhou, X. Wang, D. Hao, and L. Zhang, “Optimizing test prioritization via test distribution analysis,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 656–667.
  36. “Coverage.py,” Accessed: 2024, https://coverage.readthedocs.io/.
  37. C. Henard, M. Papadakis, M. Harman, Y. Jia, and Y. Le Traon, “Comparing white-box and black-box test prioritization,” in Proceedings of the 38th International Conference on Software Engineering, 2016, pp. 523–534.
  38. “An incorrect code logic bug of tvm,” Accessed: 2024, https://github.com/apache/tvm/issues/14805.
  39. “A patch for fixing a code logic bug of tvm,” Accessed: 2024, https://github.com/apache/tvm/pull/14820.
  40. “A tensor shape problem bug of tvm,” Accessed: 2024, https://github.com/apache/tvm/pull/15053.
  41. “Microsoft mmdnn,” Accessed: 2024, https://github.com/microsoft/MMdnn.
  42. Z. Shen, “The impact of undefined behavior on compiler optimization,” in Proceedings of the 2021 European Symposium on Software Engineering, 2021, pp. 45–50.
  43. J. Lee, Y. Kim, Y. Song, C.-K. Hur, S. Das, D. Majnemer, J. Regehr, and N. P. Lopes, “Taming undefined behavior in llvm,” ACM SIGPLAN Notices, vol. 52, no. 6, pp. 633–647, 2017.
  44. H. A. Güvenir and M. Kurtcephe, “Ranking instances by maximizing the area under roc curve,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2356–2366, 2012.
  45. J. Gu, X. Luo, Y. Zhou, and X. Wang, “Muffin: Testing deep learning libraries via neural architecture fuzzing,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1418–1430.
  46. C. Yang, Y. Deng, J. Yao, Y. Tu, H. Li, and L. Zhang, “Fuzzing automatic differentiation in deep-learning libraries,” arXiv preprint arXiv:2302.04351, 2023.
  47. A. Wei, Y. Deng, C. Yang, and L. Zhang, “Free lunch for testing: Fuzzing deep-learning libraries from open source,” in 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022, pp. 995–1007.
  48. H. J. Kang, P. Rattanukul, S. A. Haryono, T. G. Nguyen, C. Ragkhitwetsagul, C. Pasareanu, and D. Lo, “Skipfuzz: Active learning-based input selection for fuzzing deep learning libraries,” arXiv preprint arXiv:2212.04038, 2022.
  49. H. V. Pham, T. Lutellier, W. Qi, and L. Tan, “Cradle: cross-backend validation to detect and localize bugs in deep learning libraries,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).   IEEE, 2019, pp. 1027–1038.
  50. J. Wang, T. Lutellier, S. Qian, H. V. Pham, and L. Tan, “Eagle: creating equivalent graphs to test deep learning libraries,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 798–810.
  51. X. Zhang, N. Sun, C. Fang, J. Liu, J. Liu, D. Chai, J. Wang, and Z. Chen, “Predoo: precision testing of deep learning operators,” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021, pp. 400–412.
  52. Y. Deng, C. S. Xia, H. Peng, C. Yang, and L. Zhang, “Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models,” in Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis, 2023, pp. 423–435.
  53. X. Qin, H. Zhong, and X. Wang, “Testmig: Migrating gui test cases from ios to android,” in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019, pp. 284–295.
  54. S. Talebipour, Y. Zhao, L. Dojcilović, C. Li, and N. Medvidović, “Ui test migration across mobile platforms,” in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2021, pp. 756–767.
  55. F. Behrang and A. Orso, “Test migration between mobile apps with similar functionality,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2019, pp. 54–65.
  56. S. Elbaum, H. N. Chin, M. B. Dwyer, and M. Jorde, “Carving and replaying differential unit test cases from system test cases,” IEEE Transactions on Software Engineering, vol. 35, no. 1, pp. 29–45, 2008.
  57. H. Zhong, “Enriching compiler testing with real program from bug report,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–12.
  58. M. Abdi and S. Demeyer, “Test transplantation through dynamic test slicing,” in 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM).   IEEE, 2022, pp. 35–39.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Qingchao Shen (6 papers)
  2. Yongqiang Tian (20 papers)
  3. Haoyang Ma (8 papers)
  4. Junjie Chen (89 papers)
  5. Lili Huang (8 papers)
  6. Ruifeng Fu (1 paper)
  7. Shing-Chi Cheung (54 papers)
  8. Zan Wang (21 papers)
Citations (1)