Automated Repair of Ambiguous Natural Language Requirements

Published 12 May 2025 in cs.SE | (2505.07270v2)

Abstract: The widespread adoption of LLMs in software engineering has amplified the role of natural language (NL). The inherent ambiguity of NL threatens software quality, because ambiguous requirements may lead to faulty program generation. The complexity of ambiguity detection and resolution motivates us to introduce automated repair of ambiguous NL requirements, which we approach by reducing code generation uncertainty and aligning NL with input-output examples. Repairing ambiguity in requirements is a difficult challenge for LLMs, as it demands metacognition - the model must understand how its own interpretation changes when the text is altered. Our experiments show that directly prompting an LLM to detect and resolve ambiguities results in irrelevant or inconsistent clarifications. Our key insight is to decompose this problem into simpler sub-problems that do not require metacognitive reasoning. First, we analyze and repair the LLM's interpretation of requirements embodied by the distribution of programs they induce by using traditional testing and program repair. Second, we repair requirements based on the changes to the distribution via contrastive specification inference. We implemented this proposal, dubbed as SpecFix, and evaluated it by using three state-of-the-art LLMs (GPT-4o, DeepSeek-V3 and Qwen2.5-Coder-32b) across two widely used code generation benchmarks, namely HumanEval+ and MBPP+. Our results show that SpecFix, operating autonomously without human intervention or external information, modifies 23.93% of the requirements, leading to a 33.66% improvement in model Pass@1 on the modified requirements. Across the entire benchmark, this corresponds to an 4.3% increase in overall Pass@1. Importantly, SpecFix's repairs generalize across models: requirements repaired by one model boost the performance of other models by 9.6%.