Papers
Topics
Authors
Recent
2000 character limit reached

Test Case Generation from Bug Reports via Large Language Models: A Cognitive Layered Evaluation Framework (2510.05365v1)

Published 6 Oct 2025 in cs.SE

Abstract: LLMs are increasingly applied to automated software testing, yet their ability to generalize beyond memorized patterns and reason about natural language bug reports remains unclear. We present a systematic evaluation of LLM reasoning in test case generation, structured around the cognitive layers of Bloom's taxonomy: \textit{Remember}, \textit{Understand}, \textit{Apply}, \textit{Analyze}, \textit{Evaluate}, and \textit{Create}, which progressively assess higher levels of cognitive and reasoning capabilities. Building on the LIBRO framework, we evaluate StarCoder and GPT-4o on Defects4J, GHRB, and mutated variants that introduce linguistic and semantic challenges. Our findings show that both models largely reproduce prior results with minor deviations (\textit{Remember}), exhibit partial robustness to linguistic rephrasings and translations while uncovering unique reproducible bugs (\textit{Understand}), but suffer severe performance drops exceeding 60\% under identifier mutations (\textit{Apply}). Conversely, providing near-identical few-shot examples in an open-book setting improves success rates by up to three times, and component-level analysis reveals that structured technical elements, such as test code and method names, are far more impactful than narrative descriptions for successful test generation (\textit{Analyze}). These insights illuminate the cognitive processes underlying LLM-generated tests, suggest concrete directions for improving performance, and establish a robust and realistic evaluation paradigm for this task.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.