Data Augmentation by Fuzzing for Neural Test Generation (2406.08665v2)

Published 12 Jun 2024 in cs.SE and cs.AI

Abstract: Testing is essential to modern software engineering for building reliable software. Given the high costs of manually creating test cases, automated test case generation, particularly methods utilizing LLMs, has become increasingly popular. These neural approaches generate semantically meaningful tests that are more maintainable compared with traditional automatic testing methods like fuzzing. However, the diversity and volume of unit tests in current datasets are limited. In this paper, we introduce a novel data augmentation technique, FuzzAug, that introduces the benefits of fuzzing to LLMs to preserve valid program semantics and provide diverse inputs. This enhances the model's ability to embed correct inputs that can explore more branches of the function under test. Our evaluations show that models trained with dataset augmented by FuzzAug increase assertion accuracy by 5%, improve compilation rate by more than 10%, and generate unit test functions with 5% more branch coverage. This technique demonstrates the potential of using dynamic software testing to improve neural test generation, offering significant enhancements in neural test generation.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/ComputerPapers/status/1801524597835153507

Data Augmentation by Fuzzing for Neural Test Generation (2406.08665v2)

Summary

Related Papers

Tweets