Weak Word Smell in Software Requirements

Updated 12 January 2026

Weak Word Smell is defined by ambiguous lexical items—such as generic pronouns and indefinite determiners—that hinder clear mapping to actors, actions, or attributes.
Empirical studies reveal that up to 25% of defects in requirements stem from weak words, highlighting significant impacts on clarity and downstream validation.
Rule-based detection methods using POS tagging and morphological analysis achieve high accuracy across diverse documentation, guiding best practices for remediation.

The Weak Word Smell is a linguistic artifact encountered in natural-language requirements, use-case specifications, and manual test instructions that signals ambiguity, vagueness, or under-specification at the word level. Sentences that harbor weak words—generic pronouns, indefinite determiners, placeholder actor names, and imprecise adjectives or adverbs—force readers or testers to guess intent, thereby increasing the risk of misinterpretation, defective downstream artifacts, and unreliable validation outcomes. Recent research has systematically categorized this phenomenon and developed rule-based automated detection engines for requirements engineering and software testing contexts (Seki et al., 2020, Soares et al., 2023).

1. Types and Formal Definition

The Weak Word Smell, as codified in "Detecting Bad Smells in Use Case Descriptions" (Seki et al., 2020), divides into six word-level ambiguity subcategories:

Smell Type	Key Symptom	Example
Pronoun (C₁,₅.1)	Unclear referent for pronoun	"If the password is incorrect, it returns..."
Omitted Word (C₁,₅.2)	Missing critical noun or verb	(no main actor/action present)
"Actor" Actor (C₁,₅.3)	Placeholder actor name	"Actor selects Withdraw Money."
Unexplained Main Actor (C₁,₅.4)	Actor label not defined in schema	"Administrator cancels order."—no definition
Different Concepts by Same Word (C₁,₅.5)	Overloaded noun across contexts	"File" meaning configuration vs. log
Omitted Attribute (C₁,₅.6)	Missing qualifiers for nouns	"System displays the file."

Manual test descriptions broaden this catalog, capturing "verb + indefinite determiner," indefinite pronouns, comparative/superlative adjectives, and adverbs of manner/comparison (Soares et al., 2023):

"Open any application and suspend machine."
"Verify that everything is right."
"Performance similar or better with no graphical issues?"
"Work quickly?"

All subtypes share the core property: the lexical item cannot be mapped unambiguously to either actor, object, action, or attribute without external inference or domain knowledge.

2. Rationale and Empirical Frequency

The rationale for explicit classification is empirical. Analysis of 248 actual defects in 30 use-case documents revealed that word-level ambiguity accounted for nearly 25 % of identifiable issues (Seki et al., 2020). In manually authored test instructions, 13,169 occurrences of such patterns were detected in system samples by Soares et al. (Soares et al., 2023). Weak-word artifacts endanger specification quality by sabotaging understandability and maintainability, especially where feedback loops are prolonged or documentation is reused for porting across system variants.

3. Automated Detection: Rule-Based Metrics

Detection methodologies are uniformly rule-based, rooted in part-of-speech (POS) and morphological analysis:

Use-case categories apply Goal–Question–Metric (GQM) predicates. E.g.:

$\mathrm{NOP}(s) = \sum_{w \in s} \begin{cases} 1 & \text{if } \mathrm{POS}(w) \text{ is pronoun} \ 0 & \text{otherwise} \end{cases}$

Predicate: $\mathrm{NOP}(s) > 0$ signals pronoun smell.

Python/spaCy NLP rules for test smells (Soares et al., 2023):
- Indefinite determiners (DET) in imperative clauses ("any," "some").
- Indefinite pronouns (PRON): "everything," "something."
- Comparative (POS=ADJ, Degree=Cmp) and superlative adjectives (POS=ADJ, Degree=Sup).
- Adverbs of manner (ADV, manner subtype); comparative adverbs (Degree=Cmp).

$\exists t \quad (t.\mathrm{pos} = \mathrm{ADJ} \wedge t.\mathrm{morph}.\mathrm{Degree} = \mathrm{Cmp})$

Detection triggers on the first match; no aggregate thresholding is imposed.

For placeholder actors, detection counts literal matches to "actor" as subject ( $\mathrm{NOW}_\text{"actor"}(s) > 0$ ).

Not all weak-word categories (e.g., omitted attribute, concept overloading) can be automatically detected without deeper domain reasoning or schema cross-referencing.

4. Representative Examples and Impact

Several specific exemplars illustrate the deleterious effects of weak wording:

Ambiguous pronoun: “If the password is incorrect, it returns to the login screen.” (Who or what is “it”?)
Generic actor: “Actor selects ‘Withdraw Money.’” (Actual actor unspecified.)
Indefinite determiner: “Open any application…” (Which application?)
Comparative adjective: “Performance is similar or better…” (Relative to what baseline?)

Improved versions substitute explicit, contextual nouns and attributes:

“If the password is incorrect, the system returns the user to the login screen.”
“Customer selects the ‘Withdraw Money’ option.”
“System displays the transaction log file.”

This practice elevates reproducibility, clarity, and reduces risk of faulty test verdicts and system behaviors.

5. Detection Performance Metrics

Detection effectiveness is empirically validated:

Study Context	Precision	Recall	F₁-Score
Use Case Descriptions	0.522	1.00	—
Manual Test Steps (All)	0.92	0.95	0.935

In Soares et al.’s system breakdown (Soares et al., 2023):

Ubuntu OS: 0.90/0.96 (F₁ ≈ 0.926)
Brazilian Voting Machine: 1.00/1.00 (F₁ = 1.00)
Smartphone UI: 0.94/0.95 (F₁ ≈ 0.9446)

In use-case detection, all true positives for weak words were successfully flagged (recall = 1), but the precision was constrained by false alarms in generic pattern-matching (Seki et al., 2020). The test-smell detector, driven by spaCy's syntactic features, achieved uniformly high scores across multiple application domains.

6. Guidelines for Requirements and Test Authors

Best-practice recommendations converge on explicitness and measurability:

Replace pronouns ("it") and placeholders with concrete nouns and entities.
Use specialized actor names, not generic tokens like "Actor."
Introduce and maintain glossaries; ensure every flow label has a schema definition.
Qualify nouns ("transaction log file" vs. "file").
Replace indefinite determiners with specific countable references.
Substitute comparative/superlative adjectives/adverbs with quantifiable thresholds ("response time under 500 ms" instead of "fast").
Read each sentence for dual interpretation risk, then reword for clarity.

Such textual remediation addresses both detection by tools and human readability, improving downstream specification reliability and test verdict determinism.

7. Relation to Other Smells and Future Directions

The Weak Word Smell frequently interacts with broader ambiguity and tacit-knowledge artifacts. Although rule-based engines can robustly identify surface-level weak words, deeper forms—such as domain-specific concept overloading and runtime context-dependent references—remain outside the reach of automated detection unless supported by ontology mapping or domain modeling frameworks. Expanding existing catalogs to cover omitted words and attributes in traceable schemas is cited as a direction for further tooling and protocol development (Seki et al., 2020). The high effectiveness of syntactic/morphological patterns in generalizing over varied test idioms suggests potential for language-agnostic implementations.

In summary, the Weak Word Smell encompasses under-specified lexical constructs that compromise clarity, determinism, and verifiability in requirements and tests. Automated rule-based detection methods are mature for explicit weak-word categories, and best practices are established for both avoidance and remediation (Seki et al., 2020, Soares et al., 2023).