Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tests4Py: A Benchmark for System Testing (2307.05147v2)

Published 11 Jul 2023 in cs.SE

Abstract: Benchmarks are among the main drivers of progress in software engineering research. However, many current benchmarks are limited by inadequate system oracles and sparse unit tests. Our Tests4Py benchmark, derived from the BugsInPy benchmark, addresses these limitations. It includes 73 bugs from seven real-world Python applications and six bugs from example programs. Each subject in Tests4Py is equipped with an oracle for verifying functional correctness and supports both system and unit test generation. This allows for comprehensive qualitative studies and extensive evaluations, making Tests4Py a cutting-edge benchmark for research in test generation, debugging, and automatic program repair.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Remita Amine. 2021. youtube-dl. https://github.com/ytdl-org/youtube-dl https://github.com/ytdl-org/youtube-dl.
  2. Where is the Bug and How is It Fixed? An Experiment with Practitioners. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (Paderborn, Germany) (ESEC/FSE 2017). 117–128. https://doi.org/10.1145/3106237.3106255
  3. Evolutionary Grammar-Based Fuzzing. In Search-Based Software Engineering - 12th International Symposium, SSBSE 2020, Bari, Italy, October 7-8, 2020, Proceedings (Lecture Notes in Computer Science, Vol. 12420), Aldeida Aleti and Annibale Panichella (Eds.). Springer, 105–120. https://doi.org/10.1007/978-3-030-59762-7_8
  4. Semantic Debugging. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.). ACM, 438–449. https://doi.org/10.1145/3611643.3616296
  5. Audrey Roy Greenfeld. 2022. Cookiecutter. https://www.cookiecutter.io/ https://www.cookiecutter.io/.
  6. BugsJS: a Benchmark of JavaScript Bugs. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). 90–101. https://doi.org/10.1109/ICST.2019.00019
  7. Re-Factoring Based Program Repair Applied to Programming Assignments. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 388–398. https://doi.org/10.1109/ASE.2019.00044
  8. Vladimir Iakovlev. 2022. The Fuck. https://github.com/nvbn/thefuck https://github.com/nvbn/thefuck.
  9. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (San Jose, CA, USA) (ISSTA 2014). 437–440. https://doi.org/10.1145/2610384.2628055
  10. When Does My Program Do This? Learning Circumstances of Software Behavior. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). 1228–1239. https://doi.org/10.1145/3368089.3409687
  11. BEARS: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 468–478. https://doi.org/10.1109/SANER.2019.8667991
  12. FuzzBench: An Open Fuzzer Benchmarking Platform and Service. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). 1393–1403. https://doi.org/10.1145/3468264.3473932
  13. Sanic Community Organization. 2024. Sanic. https://sanic.dev https://sanic.dev.
  14. PySnooper: Never use print for debugging again. https://doi.org/10.5281/zenodo.10462459 https://github.com/cool-RR/PySnooper.
  15. Sebastián Ramírez. 2018. FastAPI. https://fastapi.tiangolo.com/ https://fastapi.tiangolo.com/.
  16. Jakub Roztocil. 2022. HTTPie. https://httpie.io/ https://httpie.io/.
  17. Bugs.Jar: A Large-Scale, Diverse Dataset of Real-World Java Bugs. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR ’18). 10–13. https://doi.org/10.1145/3196398.3196473
  18. Marius Smytzek and Andreas Zeller. 2022. SFLKit: a workbench for statistical fault localization. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (¡conf-loc¿, ¡city¿Singapore¡/city¿, ¡country¿Singapore¡/country¿, ¡/conf-loc¿) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 1701–1705. https://doi.org/10.1145/3540250.3558915
  19. Inputs From Hell. IEEE Trans. Software Eng. 48, 4 (2022), 1138–1153. https://doi.org/10.1109/TSE.2020.3013716
  20. Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). 180–182. https://doi.org/10.1109/ICSE-C.2017.76
  21. BugSwarm: mining and continuously growing a dataset of reproducible failures and fixes. In ICSE. IEEE / ACM, 339–349.
  22. BugsInPy: a database of existing bugs in Python programs to enable controlled testing and debugging studies. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). 1556–1560. https://doi.org/10.1145/3368089.3417943
  23. Better Test Cases for Better Automated Program Repair. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (Paderborn, Germany) (ESEC/FSE 2017). 831–841. https://doi.org/10.1145/3106237.3106274
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com