Automatic Repair of Real Bugs in Java: A Large-Scale Experiment on the Defects4J Dataset

Published 4 Nov 2018 in cs.SE | (1811.02429v1)

Abstract: Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore the effectiveness of automatic test-suite based repair on Defects4J. The result of our experiment shows that the considered state-of-the-art repair methods can generate patches for 47 out of 224 bugs. However, those patches are only test-suite adequate, which means that they pass the test suite and may potentially be incorrect beyond the test-suite satisfaction correctness criterion. We have manually analyzed 84 different patches to assess their real correctness. In total, 9 real Java bugs can be correctly repaired with test-suite based repair. This analysis shows that test-suite based repair suffers from under-specified bugs, for which trivial or incorrect patches still pass the test suite. With respect to practical applicability, it takes on average 14.8 minutes to find a patch. The experiment was done on a scientific grid, totaling 17.6 days of computation time. All the repair systems and experimental results are publicly available on Github in order to facilitate future research on automatic repair.

Abstract PDF Upgrade to Chat

Citations (241)

View on Semantic Scholar

Summary

The paper demonstrates that test-suite-based repair systems achieved a 21% patch generation rate across 224 bugs, with Nopol outperforming other methods.
The study reveals that only about 13% of generated patches are genuinely correct, highlighting a critical overfitting issue in current techniques.
The research underscores the need for enhanced test suites to improve repair accuracy and guide future advancements in automated bug repair.

Overview of Automatic Repair of Real Bugs in Java: An Experiment with Defects4J

This paper presents a detailed empirical analysis exploring the efficacy of test-suite based automated repair methods applied to real-world Java bugs, leveraging the Defects4J dataset. The research investigates the capability of contemporary repair algorithms to generate patches for a significant number of real, non-trivial bugs drawn from four Java projects within the dataset, offering insights that contribute both to practical software engineering applications and theoretical understanding of automated repair.

Key Findings and Methodology

Researchers employed three state-of-the-art automated repair systems: jGenProg, jKali, and Nopol, to evaluate their effectiveness on Defects4J, a dataset encompassing 224 real-world Java bugs spread over 231K lines of code. Each bug is accompanied by a corresponding test suite consisting of passing and failing test cases. The experiment aimed at answering four specific research questions (RQs):

Synthesize patches for bugs:
- Across all systems, patches were successfully generated for 47 out of the 224 bugs (21%).
- Nopol showed the highest individual success, generating patches for 35 bugs.
Patch correctness:
- A manual analysis of 84 generated patches indicated that only 11 (approximately 13%) were genuinely correct, confirming a significant overfitting tendency where modifications fit only the provided test cases but do not resolve the bug correctly beyond that scope.
Under-specified bugs identification:
- The study highlighted about 21 bugs as under-specified in the test-suite context, meaning trivial patches still pass due to insufficient testing scope.
- These bugs represent critical challenges for future repair techniques that wish to surpass the limitations of current methodologies.
Execution time measurement:
- The automation processes on average required 14.8 minutes to produce a patch per bug on a scientific grid, suggesting a feasible computation time acceptable for practical applications.

Implications for Future Research

The exploration unveils that while current automatic repair systems demonstrate potential, they are hindered by inadequate test suites that lead to a prevalence of incorrect patch synthesis. The observations provoke several implications for advancing automatic repair:

Enhancing Test Suites:

Effective repair largely depends on test suite quality. Strengthening test suites with more comprehensive and rigorous test cases can directly impact the correctness and reliability of generated patches.

Reducing Overfitting:

There's a need for developing advanced algorithms capable of synthesizing patches that generalize beyond the narrow test data provided, reducing the dependence on specific test inputs and consequently mitigating overfitting.

Broader Reasoning Capabilities:

Future work should focus on empowering repair algorithms with reasoning capabilities that interpret and infer desired program behavior even when it's not explicitly specified, potentially through integrating additional sources of information or heuristics.

Exploring and Comparing Efficiency:

Given that this study doesn't cover all possible techniques, extending comparisons to include other unreleased repair systems could yield additional insights. The development of benchmarks and models to rank and prioritize patch attempts would optimize research efforts and resources.

Conclusion

In conclusion, this paper highlights the present capabilities and limitations of automated repair systems based on a substantial experimental effort with real-world data. The research underscores the crucial role of test suites in automated bug fixing and lays the groundwork for numerous directions in optimizing and evolving automated software repair interventions. As improvements in these areas evolve, the reduced human burden of patching software by leveraging automation represents a promising prospect in software maintenance and evolution.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Automatic Repair of Real Bugs in Java: A Large-Scale Experiment on the Defects4J Dataset

Summary

Overview of Automatic Repair of Real Bugs in Java: An Experiment with Defects4J

Key Findings and Methodology

Implications for Future Research

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (5)

Collections

Automatic Repair of Real Bugs in Java: A Large-Scale Experiment on the Defects4J Dataset

Summary

Overview of Automatic Repair of Real Bugs in Java: An Experiment with Defects4J

Key Findings and Methodology

Implications for Future Research

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections