Ambiguity Taxonomy: Concepts & Applications

Updated 10 February 2026

Ambiguity taxonomy is a systematic classification that organizes diverse sources of ambiguity—from lexical to pragmatic—in language and strategic decision-making.
It underpins methodologies in NLP and computational linguistics by employing empirical metrics such as inter-annotator agreement, NLI tests, and LLM-based evaluations.
The taxonomy informs practical applications in code generation, query disambiguation, sports analytics, and organizational strategy, enhancing overall system performance.

Ambiguity is the property of an expression, utterance, or observation to support multiple plausible interpretations. In scientific analysis, computational linguistics, strategic decision-making, and user interaction, a proliferation of ambiguity taxonomies seek to systematize sources, manifestations, and consequences of ambiguity for robust modeling, effective annotation, and principled resolution. These taxonomies are typically hierarchical or compositional, reflecting whether the primary locus of ambiguity is semantic, syntactic, pragmatic, task-instructional, or organizational. They are empirically validated by inter-annotator agreement, targeted classification tasks, and metric-driven evaluations of downstream system performance.

1. Foundational Linguistic and Discourse Ambiguity Taxonomies

Linguistic ambiguity taxonomies in computational linguistics and NLP capture the loci and mechanisms by which utterances express multiple meanings. The taxonomy of Li et al. (Li et al., 2024), grounded in AMBIENT and classical semantic theory, enumerates eleven primary types:

Lexical: Polysemy or homonymy at the word level (e.g., "bank" as riverbank vs. institution).
Syntactic: Multiple parses of the same string (e.g., scope of PPs or coordination).
Scopal: Ambiguous quantifier order (e.g., "Every student read two poems").
Elliptical: Variable antecedent assignment in ellipsis.
Collective/Distributive: Group vs. individual action readings.
Implicative: Scalar and conversational implicature.
Presuppositional: Alternative background projections from presupposition triggers.
Idiomatic: Literal vs. idiomatic phrase interpretations.
Coreferential: Multiple referents for a pronoun or NP.
Generic/Non-Generic: Generalization vs. unique event construals.
Type/Token: Referring to kind vs. instance.

The AmbiEnt benchmark (Liu et al., 2023) operationalizes these distinctions through the natural language inference (NLI) test: a sentence is ambiguous if it admits two rewrites yielding different entailment patterns (e.g., A₁ ⊨ H but A₂ ⊭ H). Ambiguity is also subdivided by category prevalence (pragmatic: 45%; lexical: 20%; others individually <20%).

The NLI-focused taxonomy of Jiang & de Marneffe (adopted in (Jayaweera et al., 20 Jul 2025)) centers specifically on ambiguity in sentence meaning: lexical, implicature, presupposition, probabilistic enrichment, and imperfection. This framework further integrates into four high-level classes (lexical, syntactic, semantic, pragmatic), harmonized across recent NLP ambiguity taxonomies.

2. Task/Instructional Ambiguity in NLG and User Interaction

Ambiguity taxonomy in NLG and code generation settings formalizes under-specification of user intent as a critical challenge. In AmbigNLG (Niwa et al., 2024), a flat taxonomy captures six independent axes:

Context: Missing background or assumed information (e.g., target audience).
Keywords: Absence of required terminology.
Length: No output size specified.
Planning: Absence of structure or ordering cues.
Style: Unspecified tone or rhetorical mode.
Theme: Omitted focus or perspective.

Ambiguous instructions are flagged per axis by comparing specified vs. required properties (e.g., $C_{\text{spec}}(I) \subset C_{\text{req}}(x, y_{\text{ref}})$ for Context). These categories are orthogonal and typically co-occur; the operational annotation pipeline concatenates disambiguating instructions per triggered category, validating via model and human judge.

In the data-to-visualization code generation domain, (İnan et al., 10 Oct 2025) identifies semantic, presuppositional, and underspecification ambiguity. Semantic ambiguity arises from words with multiple plausible parameter mappings ("full line"), presuppositional ambiguity reflects mismatched defaults, and underspecification emerges from omitted parameters. Ambiguity is quantified using metrics such as LLM-based ambiguity rating (LAR), optimal result gap (ORG), and sampling diversity, all correlating with human judgment. Pragmatics-guided dialogue (cooperative, discursive, inquisitive) is empirically shown to resolve such ambiguity, boosting code accuracy by 8–16 percentage points depending on category.

3. Ambiguity in Question-Answering, Discourse, and Response Clarity

Ambiguity taxonomies targeting dialogue, QA, and political interview analysis emphasize response strategy. The two-level taxonomy of (Thomas et al., 2024) for political interview QA pairs is both hierarchical and empirically validated:

High-Level Classification (function $C_\text{high}: QA \to \{CR, AR, CNR\}$ ):

ClearReply (CR): Single definite answer.
ClearNonReply (CNR): Explicit refusal, ignorance claim, or clarificatory request.
AmbivalentReply (AR): Contains information but supports multiple readings or is evasive.

Fine-Grained (leaf) taxonomy ( $E_\text{low}$ ):

ExplicitReply (CR)
DeclineToAnswer, ClaimIgnorance, Clarification (CNR)
ImplicitReply, GeneralReply, PartialReply, Dodging, Deflection (AR)

Each leaf is operationalized by decision rules, surface cues (e.g., rhetorical structure), and dataset-extracted examples. This framework is directly connected to classic political science typologies but focuses on surface ambiguity of responses without attempting to infer deeper intent.

4. Domain-Specific and Applied Ambiguity Taxonomies

Ambiguity arises in technical and applied domains with unique typological structures:

A. Event Annotation in Sports Analytics

(Biermann et al., 2021) organizes ambiguous event definition for invasion games in a strictly hierarchical taxonomy (game status → possession → ball events → set pieces), with ambiguity increasing at lower levels. Human annotation experiments show inter-annotator disagreement rising from low-level motor events (reception: tIoU ≈ 0.92) to high-level, context-dependent events (pass subtypes: agreement drops to 60–80%). Ambiguity is formally measured via temporal IoU and sequence consistency matching.

B. Text-to-SQL and Query Disambiguation

AmbiSQL (Ding et al., 21 Aug 2025) partitions ambiguity into DB-related and LLM-related axes:

DB-related:
- Unclear schema reference
- Unclear value reference
- Missing SQL-related keywords
LLM-related:
- Unclear knowledge source (DB vs. background reasoning)
- Insufficient reasoning context
- Conflicting knowledge
- Ambiguous temporal/spatial scope

Each subcategory is described by operational detection criteria (LLM-driven prompts), sample error cases, and clarification templates, with empirical detection F₁ ranging from 80% to 100%. This systematic approach yields up to 50 percentage point improvements in SQL exact-match accuracy.

5. Ambiguity in Strategic Decision-Making, Organizational, and Computational Frames

Wu et al. (Wu et al., 2022) conceptualize ambiguity in strategic decision-making as interpretation-driven uncertainty, emphasizing decision frames beyond classical probabilistic or complexity-centric models. Ambiguity most naturally arises in six base frames—Perceiving, Problem-Solving, Culture, Cognition, Environment Context, Organizational Context—which decompose further into 20 “elementary” modes, e.g.:

Perceiving: Reason-based, data-based, passion-based (gut-feeling) frames for dealing with ambiguous signals.
Problem-Solving: “Dancing-floor”, “rugged”, “Mt Fuji” landscapes represent shifting, ill-structured or noisy objectives.
Cultural/Cognitive: Ambiguity filtered through belief spectrums, societies-of-mind, adaptive or multi-mindset reasoning.
Environmental/Organizational: Ambiguity from shifting social, technological, or policy signals; ambiguous reporting/organizational structures.

Distinctions are drawn between ambiguity, uncertainty (known probabilities), complexity (known structure, many variables), and chaos/ignorance (absence of modelability). AI/ML apparatuses for addressing ambiguity involve portfolio approaches: Bayesian networks, heuristic and reinforcement learning algorithms, agent-based modeling, and scenario-planning. The taxonomy thus enables modular, compositional strategies to ambiguity, in contrast to single-model paradigms.

6. Methodological Integration and Operationalization

Ambiguity taxonomies are operationalized through formal annotation schemas (NLI frame in (Li et al., 2024) and (Liu et al., 2023)), information-theoretic similarity (Resnik, 2011), metadata-driven instruction refinement (Niwa et al., 2024), pragmatic dialogue systems (İnan et al., 10 Oct 2025), and domain-specific metrics (tIoU, sequence consistency (Biermann et al., 2021); detection/clarification F₁ (Ding et al., 21 Aug 2025)). Evaluation is typically tied to inter-annotator agreement (e.g., Fleiss’ κ), macro-averaged F₁, precision, recall, and specific benchmarks reflecting ambiguity impact on downstream system performance.

Taxonomies increasingly serve not just as classification schemes, but as foundations for automatic ambiguity detection, interactive clarification, task decomposition, targeted system diagnostics, and the design of composite AI reasoning modules for robust operation in ambiguous environments.

7. Implications, Limitations, and Future Research Directions

Despite advances, all major ambiguity taxonomies highlight critical limitations: uneven category representation in benchmarks, difficulty of reliably annotating and distinguishing fine subtype distinctions (e.g., pragmatic vs. presuppositional vs. implicature), and the challenge of decoupling genuine ambiguity from annotator/guideline-induced noise (Liu et al., 2023, Jayaweera et al., 20 Jul 2025). The lack of gold-labeled ambiguity subtype annotation remains a bottleneck for supervised detection and explainable AI evaluation. Open problems include scalable annotation methods, robust automatic subtype detection, integration of interactive user clarification, and taxonomy refinement in new domains (e.g., code generation, multi-turn dialogue, strategic reasoning).

The unifying trend is the recognition that ambiguity is not noise to be ignored, but an inherent and often productive aspect of human language and decision-making. Systematic taxonomies, rooted in empirical data and formal criteria, enable rigorous study, comparative benchmarking, and principled algorithmic solutions to ambiguity in both human and machine reasoning.