Learning How to Mutate Source Code from Bug-Fixes
The paper "Learning How to Mutate Source Code from Bug-Fixes" addresses the challenge of developing effective mutation operators by leveraging deep learning techniques. Traditional mutation testing involves injecting artificial faults into the source code to simulate defects, serving purposes such as guiding test suite generation and assessing test suite effectiveness. While empirical studies have demonstrated that certain mutants can effectively mimic real faults, the creation of tailored mutation operators remains a challenging and error-prone task. This work proposes a novel approach to automatically learn mutation operators from historical bug-fixing activities in software repositories.
Methodology
The approach involves several key steps:
- Data Collection: The paper leverages 787,178 bug-fixing commits extracted from GitHub repositories, focusing only on Java projects. The data set includes method-level pairs of buggy and fixed code, referred to as Transformation Pairs (TPs).
- Abstraction and Clustering: Abstracted representations of these TPs are generated to reduce vocabulary size and facilitate learning. This includes using a Java lexer and parser to identify elements and replace them with abstracted tokens. Clustering of TPs based on similar sequences of abstract syntax tree (AST) edit actions was performed using doc2vec and k-means clustering.
- Model Training: The approach utilizes Recurrent Neural Network (RNN) Encoder-Decoder architectures with attention mechanisms to perform the mutation learning. Different configurations were tested, settling on an optimal network architecture for training on abstracted code representations.
Results
The models trained on this data show promising results:
- Performance Metrics: The models achieved BLEU scores significantly higher than baseline, indicating that the generated mutants more closely resembled actual buggy code after mutation.
- Prediction Quality: Between 9% and 45% of the generated mutants perfectly matched the original buggy code, depending on the specific mutation model cluster. Additionally, more than 98% of predictions were syntactically correct.
- Variation in Mutants: Different mutation models demonstrate varied abilities to generate unique and meaningful mutants, aided by the clustering approach which tailored learning to specific transformation patterns.
Implications and Future Work
The potential implications of these findings are multi-faceted. Practically, the development of a tool implementing these learned mutation practices could substantially improve automated software testing processes by providing a reusable pipeline for generating realistic bug scenarios. Theoretically, this research contributes to the understanding of how machine learning can be applied to software engineering tasks traditionally driven by manual processes or heuristic-based methods.
Future developments could focus on fine-tuning the RNN architecture further, expanding the methodology to other programming languages, and integrating this approach into a comprehensive, automated mutation testing framework. By doing so, researchers and practitioners can leverage a deeper understanding of buggy code to drive improved software quality assurance activities.
Overall, this research reflects an innovative step toward automating the traditionally manual and heuristic-driven task of mutation testing, highlighting the synergy between machine learning applications and software engineering practices.