Evolutionary Generative Fuzzing for Differential Testing of the Kotlin Compiler (2401.06653v1)
Abstract: Compiler correctness is a cornerstone of reliable software development. However, systematic testing of compilers is infeasible, given the vast space of possible programs and the complexity of modern programming languages. In this context, differential testing offers a practical methodology as it addresses the oracle problem by comparing the output of alternative compilers given the same set of programs as input. In this paper, we investigate the effectiveness of differential testing in finding bugs within the Kotlin compilers developed at JetBrains. We propose a black-box generative approach that creates input programs for the K1 and K2 compilers. First, we build workable models of Kotlin semantic (semantic interface) and syntactic (enriched context-free grammar) language features, which are subsequently exploited to generate random code snippets. Second, we extend random sampling by introducing two genetic algorithms (GAs) that aim to generate more diverse input programs. Our case study shows that the proposed approach effectively detects bugs in K1 and K2; these bugs have been confirmed and (some) fixed by JetBrains developers. While we do not observe a significant difference w.r.t. the number of defects uncovered by the different search algorithms, random search and GAs are complementary as they find different categories of bugs. Finally, we provide insights into the relationships between the size, complexity, and fault detection capability of the generated input programs.
- Testing autonomous cars for feature interaction failures using many-objective search. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 143–154.
- Black-box testing of deep neural networks through test case diversity. IEEE Transactions on Software Engineering (2023).
- Marat Akhin and Mikhail Belyaev. 2020. Kotlin language specification: Kotlin/Core. JetBrains Research (2020).
- An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software 86, 8 (2013), 1978–2001.
- Andrea Arcuri. 2019. RESTful API automated test case generation with EvoMaster. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 1 (2019), 1–37.
- Andrea Arcuri and Gordon Fraser. 2013. Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Software Engineering 18 (2013), 594–623.
- The oracle problem in software testing: A survey. IEEE transactions on software engineering 41, 5 (2014), 507–525.
- Systems and software verification: model-checking techniques and tools. Springer Science & Business Media.
- Abdulazeez S Boujarwah and Kassem Saleh. 1997. Compiler test case generation methods: a survey and assessment. Information and software technology 39, 9 (1997), 617–625.
- Georgescu Calin. 2023a. Empirical Study Data for Test Program-Based Generative Fuzzing for Differential Testing of the Kotlin Compiler. https://doi.org/10.5281/zenodo.8221889
- Georgescu Calin. 2023b. Fuzzer Implementation for Test Program-Based Generative Fuzzing for Differential Testing of the Kotlin Compiler. https://doi.org/10.5281/zenodo.8222995
- An empirical evaluation of evolutionary algorithms for unit test suite generation. Information and Software Technology 104 (2018), 207–235.
- A survey of compiler testing. ACM Computing Surveys (CSUR) 53, 1 (2020), 1–36.
- Deep differential testing of JVM implementations. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 1257–1268.
- Coverage-directed differential testing of JVM implementations. In proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. 85–99.
- William Jay Conover. 1999. Practical nonparametric statistics. Vol. 350. john wiley & sons.
- Empirical comparison of black-box test case generation tools for RESTful APIs. In 2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 226–236.
- Generating Class-Level Integration Tests Using Call Site Information. IEEE Transactions on Software Engineering 49, 4 (2022), 2069–2087.
- Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.
- Gordon Fraser and Andrea Arcuri. 2015. 1600 faults in 100 projects: automatically finding faults while achieving high coverage with evosuite. Empirical software engineering 20 (2015), 611–639.
- CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines.. In NDSS.
- Fuzzing with code fragments. In 21st USENIX Security Symposium (USENIX Security 12). 445–458.
- Donald E Knuth. 1968. Semantics of context-free languages. Mathematical systems theory 2, 2 (1968), 127–145.
- Language-agnostic generation of compilable test programs. In 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). IEEE, 39–50.
- V-fuzz: Vulnerability prediction-assisted evolutionary fuzzing for binary programs. IEEE Transactions on Cybernetics 52, 5 (2020), 3745–3756.
- Random testing for C and C++ compilers with YARPGen. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–25.
- R Timothy Marler and Jasbir S Arora. 2004. Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26 (2004), 369–395.
- William M McKeeman. 1998. Differential testing for software. Digital Technical Journal 10, 1 (1998), 100–107.
- Phil McMinn. 2011. Search-based software testing: Past, present and future. In 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops. IEEE, 153–163.
- Reformulating branch coverage as a many-objective optimization problem. In 2015 IEEE 8th international conference on software testing, verification and validation (ICST). IEEE, 1–10.
- Automated test case generation as a many-objective optimisation problem with dynamic selection of the targets. IEEE Transactions on Software Engineering 44, 2 (2017), 122–158.
- Sbst tool competition 2021. In 2021 IEEE/ACM 14th International Workshop on Search-Based Software Testing (SBST). IEEE, 20–27.
- Terence Parr. 2013. The definitive ANTLR 4 reference. The Definitive ANTLR 4 Reference (2013), 1–326.
- Paul Purdom. 1972. A sentence generator for testing parsers. BIT Numerical Mathematics 12, 3 (1972), 366–375.
- Vuzzer: Application-aware evolutionary fuzzing.. In NDSS, Vol. 17. 1–14.
- Dominic Steinhöfel and Andreas Zeller. 2022. Input Invariants. In ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 583––594.
- Type-Centric Kotlin Compiler Fuzzing: Preserving Test Program Correctness by Preserving Types. In 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 318–328.
- Evolutionary Approach for Concurrency Testing of Ripple Blockchain Consensus Algorithm. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 36–47.
- Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation. 283–294.
- Calin Georgescu (1 paper)
- Mitchell Olsthoorn (5 papers)
- Pouria Derakhshanfar (12 papers)
- Marat Akhin (4 papers)
- Annibale Panichella (21 papers)