Papers
Topics
Authors
Recent
Search
2000 character limit reached

Small Language Model Helps Resolve Semantic Ambiguity of LLM Prompt

Published 25 Apr 2026 in cs.CL and cs.AI | (2604.23263v1)

Abstract: LLMs are increasingly utilized in various complex reasoning tasks due to their excellent instruction following capability. However, the model's performance is highly dependent on the open-ended characteristics of the users' input prompt. Natural prompts often do not follow proper syntactic rules, which creates ambiguous queries that yield multiple interpretations. Such ambiguous prompts confuse the model in choosing the correct reasoning paths to answer questions. Prior works address this challenge by applying query editing during the LLM inference process without explicitly solving the root cause of the ambiguity. To address this limitation, we propose a pre-inference prompt optimization mechanism via explicit prompt disambiguation. Particularly, we identify semantic risks in the prompt, check their multi-perspective consistency, and resolve any semantic conflicts that arise. Finally, we organize the resolved ambiguities in a logically structured manner as a clean input to the LLM. By explicitly resolving semantic ambiguity, our method can produce a more focused attention distribution to the semantically essential tokens. We also leverage small LLMs (SLMs) as the main executor of prompt disambiguation to benefit from their efficient computation. Through comprehensive experiments on multiple benchmarks, we demonstrate that our method improves reasoning performance by 2.5 points at a cost of only \$0.02. Our study promotes explicit prompt disambiguation as an effective prompt optimization method without disturbing the internal mechanism of LLM inference.

Summary

  • The paper presents DisambiguSLM, a modular framework that preprocesses and clarifies ambiguous prompts using small language models.
  • The methodology employs dual-path verification and conflict resolution to reduce attention dispersion and improve reasoning accuracy by up to 8 points.
  • Empirical results demonstrate significant attention reallocation from non-informative tokens to key semantic anchors, enhancing LLM stability and robustness.

Semantic Disambiguation of LLM Prompts with Small LLMs

Motivation and Problem Statement

Semantic ambiguity in user-supplied prompts fundamentally impairs the reliability and stability of LLMs in reasoning-intensive tasks. In naturalistic usage, prompts are replete with underspecified references, implicit assumptions, and logical gaps that LLMs must implicitly resolve, often leading to high attention dispersion, stochastic reasoning paths, and variable output quality. Prior prompt optimization methods—including Chain-of-Thought (CoT), self-consistency, agent-based search, and automatic prompt engineering—offer gains but treat ambiguity as a downstream issue, relying on the LLM’s internal mechanisms to resolve risks at inference time. This approach does not explicitly detect, verify, or rectify semantic uncertainty at the prompt level, leaving LLM behavior sensitive to prompt formulation idiosyncrasies.

The paper "Small LLM Helps Resolve Semantic Ambiguity of LLM Prompt" (2604.23263) addresses this deficiency by proposing an upstream, cost-efficient prompt optimization mechanism—DisambiguSLM—that explicitly resolves semantic ambiguities before LLM inference. The innovative element is the use of small LLMs (SLMs) not to solve target tasks directly, but to proactively identify, verify, and reconcile semantic risk points in prompts, resulting in semantically clarified inputs with minimized reasoning path entropy.

DisambiguSLM Framework and Design

DisambiguSLM is organized as a modular, multi-layered pipeline that leverages SLMs for prompt preprocessing across distinct semantic harmonization stages. Figure 1

Figure 1: Framework of DisambiguSLM: identification of semantic risks, dual-path verification and conflict resolution, and semantic integration to produce a non-ambiguous prompt.

1. Semantic Risk Identification

An SLM is used as a global semantic scanner to perform fine-grained analysis of the input prompt, identifying spans with ambiguity, missing assumptions, or temporal uncertainty. Each risk point is formalized with its position and risk type, ensuring the upstream localization of instability sources without committing to specific interpretations.

2. Dual-Path Consistency Verification and Conflict Resolution

Each risk point is independently passed to two SLM instances, yielding two distinct semantic interpretations. The representations’ similarity is computed in embedding space. If consistent (above a threshold), their explanations are fused; if inconsistent, a further SLM instance synthesizes a logically unified, self-consistent interpretation given both variants and the broader context. This dual-path design reduces hallucination probability through redundancy and cross-verification, suppressing singleton errors and expanding semantic coverage.

3. Semantic Integration and Enhancement

All resolved risk explanations are sent to another SLM, which aggregates and structurally enhances them into a concise semantic representation. This enhanced context, concatenated with the original prompt, serves as a clarified, non-ambiguous input to the downstream LLM.

Attention Dynamics and Entropy Analysis

A central claim is that DisambiguSLM improves LLM attention allocation by concentrating it on semantically critical tokens, suppressing both spurious inference paths and entropy-driven attention diffusion. Figure 2

Figure 2: Layer-wise focus ratio comparison between QQ (ambiguous) and Q′Q' (disambiguated); Q′Q' yields higher attention focus in key reasoning layers.

DisambiguSLM increases the attention focus ratio in LLMs considerably—by up to 8–10× in early reasoning layers—relative to ambiguous prompts, evidencing more efficient resource allocation to crucial semantic anchors. Figure 3

Figure 3: Comparison of entropy-focus ratio joint distributions: Q′Q' shifts the high-focus region to lower-entropy regimes, confirming reduced reasoning uncertainty.

The joint entropy-focus analysis demonstrates that DisambiguSLM systematically shifts attention to low-entropy, high-focus allocations, minimizing competitive reasoning branches and mitigating uncertainty propagation. Figure 4

Figure 4: Token-wise attention reallocation; DisambiguSLM moves attention from sink/stopword tokens to semantically meaningful anchors needed for correct task resolution.

Token-level analysis confirms that attention is withdrawn from non-informative tokens and redistributed to target and supporting tokens critical for task success—validating the premise of upstream semantic risk rectification.

Quantitative Results

Empirical evaluations span diverse LLM architectures (GPT-4o-mini, LLaMa-3-70B, DeepSeek-V3) and multiple reasoning-centric benchmarks:

  • DisambiguSLM achieves up to +8 accuracy points over naïve prompting and +2.5 points over the best prior prompt optimization methods (e.g., OPRO, SPO, TextGrad) at a negligible computational cost ($\$0.02$ per optimized prompt).
  • Largest improvements are observed in tasks sensitive to reference resolution and ambiguity (e.g., Winograd Schema Challenge, LIAR), indicating effective reduction of semantic risk prior to inference.
  • DisambiguSLM systematically reduces disagreement rates and output instability, with disagreement rates on ambiguous inputs falling by over 50% compared to all baselines.
  • In robustness tests on systematically ambiguity-augmented variants, DisambiguSLM delivers highest accuracy and least degradation, confirming the method’s resilience to escalated semantic uncertainty.

Ablation and Sensitivity Analyses

Ablations show each pipeline stage—risk identification, dual-path verification, conflict resolution, and structured aggregation—contributes nontrivially to performance. Eliminating dual-path verification or conflict resolution causes marked drops, especially on ambiguity-heavy benchmarks.

Increasing SLM size past 1B parameters yields diminishing returns, demonstrating that prompt-level semantic harmonization depends on structured method design, not model scale. Performance is also insensitive to the semantic similarity threshold within reasonable bounds, attesting to the robustness of the pipeline.

Theoretical and Practical Implications

DisambiguSLM redefines prompt optimization as an input-level semantic filtering and compression problem, rather than a mere surface-level rephrasing or model-internal reasoning issue. This externalizes ambiguity management, constrains inference path entropy, and improves interpretability and convergence of LLM behavior. The architecture is cost-efficient and parallelizable; SLMs impose negligible latency or deployment cost and can be deployed on commodity hardware.

From a systems perspective, DisambiguSLM offers a scalable path to robust LLM deployment in safety- and consistency-critical applications, facilitating reliable reasoning under naturalistic, ill-formed prompt conditions. Theoretically, the work aligns with research on semantic risk modeling, attention entropy reduction, and multi-agent collaborative reasoning in LM ecosystems.

Future Directions

Potential extensions include adaptation to multimodal inputs (audio, vision-text), richer taxonomy of semantic risk types, and integration with retrieval or verification pipelines to address factual uncertainty (as opposed to mere semantic ambiguity). DisambiguSLM could also interoperate with modular agent-based LLM systems to partition upstream (semantic) and downstream (factual) risk management, further advancing robust and trustworthy AI reasoning.

Conclusion

DisambiguSLM establishes explicit, structured semantic ambiguity resolution—executed by small, efficient LLMs—as a foundational advance in prompt optimization for LLM-based reasoning. The approach achieves superior reasoning accuracy, stability, and robustness across ambiguous and complex scenarios, cost-effectively externalizing ambiguity management without modifying LLM internals (2604.23263). The work charts a pathway for upstream semantic preprocessing as an essential component of future LLM systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.