Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 70 tok/s

Gemini 2.5 Pro 41 tok/s Pro

GPT-5 Medium 37 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 21 tok/s Pro

Kimi K2 191 tok/s Pro

GPT OSS 120B 448 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability (2501.04810v1)

Published 8 Jan 2025 in cs.SE

Abstract: LLMs are increasingly used to generate software artifacts, such as source code, tests, and trace links. Requirements play a central role in shaping the input prompts that guide LLMs, as they are often used as part of the prompts to synthesize the artifacts. However, the impact of requirements formulation on LLM performance remains unclear. In this paper, we investigate the role of requirements smells-indicators of potential issues like ambiguity and inconsistency-when used in prompts for LLMs. We conducted experiments using two LLMs focusing on automated trace link generation between requirements and code. Our results show mixed outcomes: while requirements smells had a small but significant effect when predicting whether a requirement was implemented in a piece of code (i.e., a trace link exists), no significant effect was observed when tracing the requirements with the associated lines of code. These findings suggest that requirements smells can affect LLM performance in certain SE tasks but may not uniformly impact all tasks. We highlight the need for further research to understand these nuances and propose future work toward developing guidelines for mitigating the negative effects of requirements smells in AI-driven SE processes.

Summary

The paper finds that requirements smells, especially semantic ones, significantly degrade binary trace accuracy in LLM-guided traceability tasks.
The study employs GPT-4o and Llama 3.1 across five projects to compare the impact of smelly versus clear requirements on trace link generation.
Findings imply that improving requirement quality is essential for enhancing the performance of automated traceability in software engineering.

Examining the Effects of Requirements Smells on Automated Traceability with LLMs

In the context of software engineering (SE), LLMs are progressively employed to generate crucial software artifacts, including source code and trace links. This paper explores how the formulation of requirements, specifically the presence of requirements smells, influences LLMs when used in automated traceability tasks. Requirements smells are indicators of potential issues within requirements documents, such as ambiguity or inconsistency, which might affect subsequent development processes negatively.

Methodology and Experiments

The paper targets the specific task of automated trace link generation between requirements and source code using two prominent LLMs: GPT-4o and Llama 3.1. The authors conducted experiments across five software projects, consisting of 94 requirements and 70 implemented trace links, to assess the impact of smelly versus non-smelly requirements on link generation efficacy. The smelly requirements were categorized into three types, following a recognized classification of defects in requirements: lexical, syntactic, and semantic smells.

Key Findings

The results reveal a nuanced influence of requirements smells on LLM performance. For binary tracing accuracy—determining whether a requirement is implemented in the code—the presence of smells rendered a small but statistically significant negative effect. However, when tasked with discerning specific lines of code associated with a requirement (LOC tracing), the impact of smells was not statistically significant. Notably, semantic smells (e.g., ambiguities or logical inconsistencies) were found to have a more detrimental effect on tracing performance than syntactic or lexical smells.

Discussions and Implications

The findings indicate that while LLMs are adept at handling trace link generation for simple systems, the presence of low-quality requirements with smells can degrade performance, albeit slightly in this context. These insights suggest that LLMs might efficiently manage tasks, provided the complexity of the system remains bounded. The paper advocates for extended research to examine how various SE tasks are differently impacted by diverse smell categories, the influence of project scale, and domain-specific complexities.

Future Directions

The authors propose pathways for future research aimed at expanding the understanding of requirements smells in a broader scope of SE tasks. Potential future avenues include:

Effect of Smells in Other Tasks: Expanding the scope beyond trace link recovery to include tasks such as code generation and model synthesis, aiming to correlate requirements smells with artifact defects, like code or test smells.
Impact of Scale and Domain: Testing effects on larger and more complex systems, wherein contextual complexity might amplify the impact of requirements smells on LLM performance.
Development of Mitigation Strategies: Investigating techniques for identifying and correcting requirements smells to mitigate adverse impacts, involving self or human-assisted correction strategies.

Conclusion

This paper underscores the importance of non-functional requirement qualities in LLM-guided SE processes. By identifying specific smell types that significantly affect task performance, the research sheds light on best practices for ensuring high-quality SE outputs when employing LLMs. Ensuring clarity and consistency in requirements can be instrumental in harnessing the full potential of LLMs in automated traceability and beyond.