Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies (1901.06024v1)

Published 17 Jan 2019 in cs.SE

Abstract: Benchmarks of bugs are essential to empirically evaluate automatic program repair tools. In this paper, we present Bears, a project for collecting and storing bugs into an extensible bug benchmark for automatic repair studies in Java. The collection of bugs relies on commit building state from Continuous Integration (CI) to find potential pairs of buggy and patched program versions from open-source projects hosted on GitHub. Each pair of program versions passes through a pipeline where an attempt of reproducing a bug and its patch is performed. The core step of the reproduction pipeline is the execution of the test suite of the program on both program versions. If a test failure is found in the buggy program version candidate and no test failure is found in its patched program version candidate, a bug and its patch were successfully reproduced. The uniqueness of Bears is the usage of CI (builds) to identify buggy and patched program version candidates, which has been widely adopted in the last years in open-source projects. This approach allows us to collect bugs from a diversity of projects beyond mature projects that use bug tracking systems. Moreover, Bears was designed to be publicly available and to be easily extensible by the research community through automatic creation of branches with bugs in a given GitHub repository, which can be used for pull requests in the Bears repository. We present in this paper the approach employed by Bears, and we deliver the version 1.0 of Bears, which contains 251 reproducible bugs collected from 72 projects that use the Travis CI and Maven build environment.

Citations (117)

Summary

  • The paper introduces Bears, an innovative benchmark that leverages CI pipelines to automatically collect reproducible Java bugs for APR research.
  • It demonstrates enhanced coverage with 251 reproducible bugs from 72 diverse projects, surpassing traditional commit-based benchmarks.
  • The paper identifies challenges like flaky tests and non-standard builds, paving the way for future extensions and broader CI support in APR studies.

An Analysis of the Bears Java Bug Benchmark

The paper "Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies" presents an innovative approach to the creation and use of benchmarks for evaluating automatic program repair (APR) tools. The authors introduce Bears, a benchmark uniquely leveraging Continuous Integration (CI) systems to identify and collect bugs in Java programs, offering not only a novel methodology for bug collection but also an extensible structure for the research community.

Overview and Methodology

Bears stands out for using CI pipeline statuses to identify buggy and patched versions of programs. The traditional methodology in APR research, as used in benchmarks like Defects4J and Bugs.jar, relies heavily on mining past commits and bug trackers. This often constrains the collected bugs to mature projects with comprehensive bug tracking processes. In contrast, Bears employs CI tools like Travis CI to evaluate the compilation and test execution statuses, providing a broader spectrum of bug sources by focusing on commit building states and allowing for the inclusion of a more diverse set of projects beyond the well-established ones.

The benchmark is structured around the concept of reproducibility. It identifies pairs of builds—buggy and corrected—where the buggy build fails due to test failures not present in the subsequent patched build. This process involves validating builds obtained from public GitHub repositories that use Maven and Travis CI, ensuring the reproduction of genuine bugs fixed by human developers. The Bears-collector automates this step, which aids in tackling the inherently challenging task of bug collection, reducing manual errors and enhancing reproducibility.

Implications and Contributions

The Bears benchmark makes significant contributions to the APR domain:

  • Bugs from Diverse Projects: By not restricting the bug collection process to mature projects, Bears includes bugs from 72 different open-source projects, providing diverse domains and increasing the ecological validity of empirical evaluations.
  • Extensibility: Bears was designed for extensibility, allowing researchers to contribute additional bugs easily. This addresses a major limitation of existing benchmarks, which are rarely updated post-release.
  • Public Accessibility and Community Contribution: By employing a public GitHub repository for storing reproduced bugs and their patches, Bears fosters community participation in expanding the benchmark.

However, it is important to acknowledge potential challenges. The automation in the Bears collector, while reducing the need for direct human intervention, can struggle with corner cases such as flaky tests and varied environments. More intricate scenarios, such as non-standard multi-module projects, present additional complexities. Furthermore, the reliance on CI and specifically on Maven projects can initially limit the range of projects included until further expansions are made to accommodate other build tools and CI systems.

Numerical Results and Future Directions

Version 1.0 of Bears consists of 251 reproducible bugs across 72 projects. The authors underscore the importance of maintaining a benchmark's relevancy over time, hinting at a future where Bears could be a community-driven asset expanding dynamically with new bugs and insights.

Looking forward, the exploration of repair patterns across a more diversified bug dataset imparts greater opportunities for designing robust APR strategies. The ongoing collection of bugs in real-time could provide a valuable feedback loop with developers, enhancing the machine understanding of bug types and resolution strategies. Moreover, making Bears compatible with more CI systems and build tools like Gradle is a prospective enhancement that could further broaden its applicability in real-world scenarios.

In conclusion, the Bears Java Bug Benchmark provides a substantive advancement for APR research by introducing an innovative, scalable, and collaborative framework for bug collection and analysis. It not only paves the way for extensive and diverse empirical studies but also sets a transformative precedent in constructing adaptive benchmarks tailored to evolve with the fast-paced software development ecosystem.

Youtube Logo Streamline Icon: https://streamlinehq.com