- The paper finds that requirements smells, especially semantic ones, significantly degrade binary trace accuracy in LLM-guided traceability tasks.
- The study employs GPT-4o and Llama 3.1 across five projects to compare the impact of smelly versus clear requirements on trace link generation.
- Findings imply that improving requirement quality is essential for enhancing the performance of automated traceability in software engineering.
Examining the Effects of Requirements Smells on Automated Traceability with LLMs
In the context of software engineering (SE), LLMs are progressively employed to generate crucial software artifacts, including source code and trace links. This paper explores how the formulation of requirements, specifically the presence of requirements smells, influences LLMs when used in automated traceability tasks. Requirements smells are indicators of potential issues within requirements documents, such as ambiguity or inconsistency, which might affect subsequent development processes negatively.
Methodology and Experiments
The paper targets the specific task of automated trace link generation between requirements and source code using two prominent LLMs: GPT-4o and Llama 3.1. The authors conducted experiments across five software projects, consisting of 94 requirements and 70 implemented trace links, to assess the impact of smelly versus non-smelly requirements on link generation efficacy. The smelly requirements were categorized into three types, following a recognized classification of defects in requirements: lexical, syntactic, and semantic smells.
Key Findings
The results reveal a nuanced influence of requirements smells on LLM performance. For binary tracing accuracy—determining whether a requirement is implemented in the code—the presence of smells rendered a small but statistically significant negative effect. However, when tasked with discerning specific lines of code associated with a requirement (LOC tracing), the impact of smells was not statistically significant. Notably, semantic smells (e.g., ambiguities or logical inconsistencies) were found to have a more detrimental effect on tracing performance than syntactic or lexical smells.
Discussions and Implications
The findings indicate that while LLMs are adept at handling trace link generation for simple systems, the presence of low-quality requirements with smells can degrade performance, albeit slightly in this context. These insights suggest that LLMs might efficiently manage tasks, provided the complexity of the system remains bounded. The paper advocates for extended research to examine how various SE tasks are differently impacted by diverse smell categories, the influence of project scale, and domain-specific complexities.
Future Directions
The authors propose pathways for future research aimed at expanding the understanding of requirements smells in a broader scope of SE tasks. Potential future avenues include:
- Effect of Smells in Other Tasks: Expanding the scope beyond trace link recovery to include tasks such as code generation and model synthesis, aiming to correlate requirements smells with artifact defects, like code or test smells.
- Impact of Scale and Domain: Testing effects on larger and more complex systems, wherein contextual complexity might amplify the impact of requirements smells on LLM performance.
- Development of Mitigation Strategies: Investigating techniques for identifying and correcting requirements smells to mitigate adverse impacts, involving self or human-assisted correction strategies.
Conclusion
This paper underscores the importance of non-functional requirement qualities in LLM-guided SE processes. By identifying specific smell types that significantly affect task performance, the research sheds light on best practices for ensuring high-quality SE outputs when employing LLMs. Ensuring clarity and consistency in requirements can be instrumental in harnessing the full potential of LLMs in automated traceability and beyond.