Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BUGSPHP: A dataset for Automated Program Repair in PHP (2401.07356v2)

Published 14 Jan 2024 in cs.SE

Abstract: Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research on bugs in PHP programs due to the lack of a benchmark PHP bug dataset. This is surprising given that PHP has been one of the most widely used server-side languages for over two decades, being used in a variety of contexts such as e-commerce, social networking, and content management. This paper presents a benchmark dataset of PHP bugs on real-world applications called BUGSPHP, which can enable research on analysis, testing, and repair for PHP programs. The dataset consists of training and test datasets, separately curated from GitHub and processed locally. The training dataset includes more than 600,000 bug-fixing commits. The test dataset contains 513 manually validated bug-fixing commits equipped with developer-provided test cases to assess patch correctness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. [n. d.]. W3Techs - World Wide Web Technology Surveys. https://w3techs.com/
  2. Jyoti Kaubiyal Ankit Kumar Jain, Somya Ranjan Sahoo. 2021. Online social networks security and privacy: comprehensive review and analysis. Complex Intell. Syst. 7 (october 2021), 2157–2177. https://doi.org/10.1007/s40747-021-00409-7
  3. Vul4J: A Dataset of Reproducible Java Vulnerabilities Geared towards the Study of Program Repair Techniques. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 464–468. https://doi.org/10.1145/3524842.3528482
  4. Viktor Csuvik and László Vidács. 2022. FixJS: a dataset of bug-fixing JavaScript commits. In Proceedings of the 19th International Conference on Mining Software Repositories. 712–716.
  5. Automated Program Repair. Commun. ACM 62, 12 (nov 2019), 56–65. https://doi.org/10.1145/3318162
  6. Bugsjs: a benchmark of javascript bugs. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). IEEE, 90–101.
  7. Cure: Code-aware neural machine translation for automatic program repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1161–1173.
  8. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.
  9. The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Transactions on Software Engineering 41, 12 (2015), 1236–1256.
  10. Megan Leonhardt. 2019. EARN Equifax to pay $700 million for massive data breach. Here’s what you need to know about getting a cut. https://www.cnbc.com/2019/07/22/what-you-need-to-know-equifax-data-breach-700-million-settlement.html
  11. Leonardo Mariani Luca Gazzola, Daniela Micucci. 2019. Automatic Software Repair: A Survey. IIEEE Trans. Software Eng. 45 (january 2019), 34–67. https://doi.org/10.1109/TSE.2017.2755013
  12. Bears: An extensible java bug benchmark for automatic program repair studies. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 468–478.
  13. Devon H. O’Dell. 2017. The debugging mind-set. Communications of the ACM, 40–45. https://doi.org/10.1145/3052939
  14. Bugs. jar: A large-scale, diverse dataset of real-world java bugs. In Proceedings of the 15th international conference on mining software repositories. 10–13.
  15. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 4 (2019), 1–29.
  16. BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 1556–1560. https://doi.org/10.1145/3368089.3417943
  17. Neural program repair with execution-based backpropagation. In Proceedings of the 44th International Conference on Software Engineering. 1506–1518.

Summary

We haven't generated a summary for this paper yet.