Automated Unit Test Improvement using Large Language Models at Meta (2402.09171v1)

Published 14 Feb 2024 in cs.SE

Abstract: This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Instagram and Facebook platforms. In an evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases built correctly, 57% passed reliably, and 25% increased coverage. During Meta's Instagram and Facebook test-a-thons, it improved 11.5% of all classes to which it was applied, with 73% of its recommendations being accepted for production deployment by Meta software engineers. We believe this is the first report on industrial scale deployment of LLM-generated code backed by such assurances of code improvement.

Citations (35)

View on Semantic Scholar

Summary

The paper demonstrates that TestGen-LLM improves unit test coverage by 10% by automating the generation of high-quality test cases.
The methodology employs a rigorous filtration process and ensemble learning to ensure only reliable and novel test cases are integrated.
Deployment at Meta’s test-a-thons on platforms like Instagram and Facebook validates the tool’s practical impact in enhancing software quality.

Automated Unit Test Improvement at Meta through TestGen-LLM

Introduction to TestGen-LLM

The advent of LLMs has provided a new impetus in the endeavor to automate more aspects of software development and testing. This intriguing paper introduces TestGen-LLM, an innovative tool developed by Meta Platforms Inc., which exploits the capabilities of LLMs for the specific task of improving unit tests. TestGen-LLM extends existing unit tests, particularly for Android applications written in Kotlin, by generating additional test cases. These cases target enhancing code coverage by identifying previously overlooked edge cases. Notably, TestGen-LLM exemplifies an instance of Assured Offline LLM-Based Software Engineering (Assured Offline LLMsE), a methodology that differentiates it from common LLM applications by ensuring generated test classes not only adhere to but elevate the quality of the original test suite.

TestGen-LLM System Architecture

TestGen-LLM operates using a dynamic two-fold use case; evaluation and deployment. The tool employs a rigorous filtration process to each candidate test case it generates, discarding any that fail to meet stringent criteria such as inability to build, unreliability, potential flakiness, and lack of novel code coverage. This ensures any recommendations made by the tool genuinely contribute to the test suite's robustness. Through telemetry, the system logs detailed outcomes of each test case, fostering an ensemble learning approach that gives a comprehensive insight into the improvement process.

Deployment and Results

TestGen-LLM’s deployment narrative within Meta commences with its conceptualization in spring 2023, evolving through various stages to its application in Meta test-a-thons. The tool was especially applied to Instagram and Facebook platforms, showcasing substantial improvements in unit test coverage and code base. An evaluation in the context of Instagram's Reels and Stories showed that a significant portion of the test cases generated by TestGen-LLM not only built correctly but also passed reliably, with a notable percentage offering tangible coverage enhancements. This evaluation substantiates TestGen-LLM's efficacy in a real-world large-scale deployment.

Quantitative Outcomes

In its utility, TestGen-LLM demonstrated an ability to enhance 10% of all classes to which it was applied, with a majority of recommendations being favorably accepted by Meta's software engineers for production deployment. These figures reflect a promising advance in automating test improvement, validating the tool's utility and effectiveness in a high-stakes, industrial setting.

Theoretical and Practical Implications

From a theoretical perspective, TestGen-LLM's development and application underscore the practical potential of LLMs in automated test generation and improvement. The research sheds light on the tool's innovative aspects, such as its stringent filtration process and use of ensemble learning. Practically, TestGen-LLM represents a significant step towards fully automated, human-independent test improvement, demonstrating a successful operational model that combines LLMs with traditional software engineering workflows.

Future Developments in LLM and AI in Software Engineering

The paper's insights into TestGen-LLM's deployment provide a promising outlook for future applications of LLMs and generative AI in software engineering. Its methodology and results open paths for exploring more nuanced applications of LLMs in various facets of software development and testing. Additionally, the implications of TestGen-LLM's filtration process and ensemble learning approach may foster further innovative uses of AI in enhancing code quality and reliability.

Conclusion

TestGen-LLM marks a notable achievement in the quest to integrate AI and machine learning with traditional software engineering to automate and improve the software test development process. Its success in improving unit test coverage and quality within a real-world, large-scale industrial context at Meta highlights the potential of LLM-based tools in software testing and quality assurance domains.

PDF Markdown

Related Papers

Tweets

https://twitter.com/GPTJustin/status/1770233883629654503

https://twitter.com/itamar_mar/status/1792671154538152126

https://twitter.com/qtnx_/status/1831056453165322679

https://twitter.com/arcnotes/status/1761432049112821824

https://twitter.com/8teAPi/status/1772321560571035882

https://twitter.com/BrianRoemmele/status/1770238204500984213