Enhancing Large Language Models for Text-to-Testcase Generation (2402.11910v1)
Abstract: Context: Test-driven development (TDD) is a widely employed software development practice that involves developing test cases based on requirements prior to writing the code. Although various methods for automated test case generation have been proposed, they are not specifically tailored for TDD, where requirements instead of code serve as input. Objective: In this paper, we introduce a text-to-testcase generation approach based on a LLM (GPT-3.5) that is fine-tuned on our curated dataset with an effective prompt design. Method: Our approach involves enhancing the capabilities of basic GPT-3.5 for text-to-testcase generation task that is fine-tuned on our curated dataset with an effective prompting design. We evaluated the effectiveness of our approach using a span of five large-scale open-source software projects. Results: Our approach generated 7k test cases for open source projects, achieving 78.5% syntactic correctness, 67.09% requirement alignment, and 61.7% code coverage, which substantially outperforms all other LLMs (basic GPT-3.5, Bloom, and CodeT5). In addition, our ablation study demonstrates the substantial performance improvement of the fine-tuning and prompting components of the GPT-3.5 model. Conclusions: These findings lead us to conclude that fine-tuning and prompting should be considered in the future when building a LLM for the text-to-testcase generation task
- A3test: Assertion-augmented automated test case generation. arXiv preprint arXiv:2302.10352 0, 0.
- Advancing requirements engineering through generative ai: Assessing the role of llms. arXiv preprint arXiv:2310.13976 .
- Test driven development: A practical guide. Prentice Hall Professional Technical Reference.
- Test driven development: By example. Addison-Wesley Professional.
- Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 13, e1484.
- Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901.
- URL: https://commons.apache.org/proper/commons-cli/.
- URL: https://commons.apache.org/proper/commons-csv/.
- The effectiveness of test-driven development technique in reducing defect density. Journal of Research and Practice in Information Technology 38, 185–191.
- Large language models for software engineering: Survey and open problems. arXiv preprint arXiv:2310.03533 .
- Evosuite: automatic test suite generation for object-oriented software, in: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, ACM. pp. 416–419.
- Chatgpt for vulnerability detection, classification, and repair: How far are we?, in: 30th Asia-Pacific Software Engineering Conference (APSEC 2023).
- URL: https://github.com/google/gson/.
- Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 .
- Does test-driven development really improve software quality? IEEE Computer 41, 79–84.
- JFreeChart, 2022. URL: https://jfree.org/jfreechart/.
- Speech and Language Processing. 3rd ed.
- URL: https://commons.apache.org/proper/commons-lang/.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 .
- URL: https://www.eclemma.org/jacoco/trunk/index.html.
- Estimating the carbon footprint of bloom, a 176b parameter language model. arXiv preprint arXiv:2211.02001 .
- maven, 2023. Apache maven. URL: https://maven.apache.org/.
- Assessing test-driven development at ibm, in: 25th International Conference on Software Engineering, 2003. Proceedings., IEEE. pp. 564–569.
- An experiment in test-driven development: Lessons learned, in: Proceedings of the 2002 ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM. pp. 238–246.
- Generative artificial intelligence for software engineering–a research agenda. arXiv preprint arXiv:2310.18648 .
- Gpt-4 technical report. arxiv 2303.08774. View in Article .
- Randoop: feedback-directed random testing for java, in: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, pp. 815–816.
- An empirical evaluation of using large language models for automated unit test generation. arXiv preprint arXiv:2302.06527 .
- Exploring the effectiveness of large language models in generating unit tests. arXiv preprint arXiv:2305.00418 .
- surefire, 2023. Surefire plugin. URL: https://maven.apache.org/surefire-maven-plugin/.
- Tree-sitter, 2022. URL: http://tree-sitter.github.io/tree-sitter.
- Methods2test: A dataset of focal methods mapped to test cases. arXiv preprint arXiv:2203.12776 .
- Unit test case generation with transformers and focal context. arXiv preprint arXiv:2009.05617 .
- Software testing with large language model: Survey, landscape, and vision. arXiv preprint arXiv:2307.07221 .
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 .
- Evaluation of test-driven development-an industrial case study, in: International Conference on Evaluation of Novel Approaches to Software Engineering, SCITEPRESS. pp. 103–110.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 .
- Automated program repair in the era of large pre-trained language models, in: Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.
- Chatunitest: a chatgpt-based automated unit test generation tool. arXiv preprint arXiv:2305.04764 .
- No more manual tests? evaluating and improving chatgpt for unit test generation. arXiv preprint arXiv:2305.04207 .
- Saranya Alagarsamy (2 papers)
- Chakkrit Tantithamthavorn (49 papers)
- Chetan Arora (79 papers)
- Aldeida Aleti (31 papers)