LiTEx Taxonomy: NLI Explanation Framework
- LiTEx Taxonomy is a hierarchical framework categorizing NLI free-text explanations into Text-Based and World Knowledge reasoning types.
- It was empirically developed using extensive annotation of the e-SNLI corpus, achieving high inter-annotator reliability with detailed guidelines.
- Practical insights include improved explanation generation and deeper analysis of within-label variation in Natural Language Inference tasks.
LiTEx Taxonomy refers to a linguistically informed taxonomy for categorizing free-text explanations in the Natural Language Inference (NLI) task, with a central focus on capturing the diversity and nuance in annotators’ reasoning strategies—even when their final NLI label (entailment, contradiction, or neutral) is the same. Developed in response to observed Human Label Variation (HLV) and, more specifically, within-label variation, LiTEx provides a structured, validated framework for decomposing and analyzing the rationales behind NLI decisions, facilitating both qualitative paper and improvements in automated explanation generation (Hong et al., 28 May 2025, Hong et al., 18 Oct 2025).
1. Structural Definition of the LiTEx Taxonomy
LiTEx is defined as a hierarchical, linguistically guided categorization for free-text NLI explanations, partitioned into two principal superordinate classes: Text-Based (TB) and World Knowledge-Based (WK) reasoning. The taxonomy was empirically developed and validated via extensive annotation of the e-SNLI corpus, focusing exclusively on annotators’ free-text rationales rather than surface-level highlight spans or coarse NLI item categories.
LiTEx Category Structure:
| Superordinate Type | Subcategories |
|---|---|
| Text-Based (TB) | Coreference, Syntactic, Semantic, Pragmatic, Absence of Mention, Logical Conflict |
| World Knowledge (WK) | Factual Knowledge, Inferential Knowledge |
- Coreference: Focused on referent resolution within the sentence.
- Syntactic: Pertaining to structural transformations that preserve meaning (e.g., active/passive alternation).
- Semantic: Involving lexical-semantic relations (synonymy, antonymy, entailment).
- Pragmatic: Based on conversational implicatures or presuppositions.
- Absence of Mention: Attributing the decision to a missing element in the premise/hypothesis.
- Logical Conflict: Direct reference to incompatible or contradictory logical structures.
- Factual Knowledge: Invokes general, commonly held external knowledge.
- Inferential Knowledge: Relies on context-dependent or culturally conditioned inferences.
This fine-grained classification supports the explicit mapping of each explanation segment to a unique reasoning type, with categories operationalized via specific annotation guidelines and decision criteria.
2. Human Label Variation and the Motivation for LiTEx
HLV is pervasive in NLI: annotators given the same premise–hypothesis pair often assign different labels, reflecting both the inherent linguistic ambiguity and the pluralism of plausible reasoning. Crucially, LiTEx’s primary motivation is to address within-label variation—the phenomenon whereby annotators agree on the final label but provide explanations that differ fundamentally in reasoning type. Conventional resources (e.g., e-SNLI) provide highlight spans that may conflate or obscure these differences, as the identical span might serve distinct rationales for different annotators.
Empirical findings in the annotated e-SNLI subset reveal that 61.2% of items exhibited more than one LiTEx category across their explanations—directly evidencing significant within-label variation. This suggests that explanation diversity is not noise but an essential feature of natural language understanding in NLI annotation (Hong et al., 28 May 2025).
3. Annotation Protocols and Validation
The LiTEx taxonomy was built and validated through a methodical annotation campaign:
- Annotation Procedure:
- A curated set (1,002 NLI items, each with three free-text explanations) was sampled from e-SNLI.
- Each explanation was split as needed into minimal, reasoning-consistent segments, yielding 3,108 annotated explanation fragments.
- Trained annotators (with high inter-annotator reliability: Cohen’s κ = 0.862) categorized each segment using guiding questions (e.g., “Does the explanation resolve coreference?”).
- Automated Classification:
- Transformer-based models (BERT, RoBERTa) were fine-tuned to classify explanations by LiTEx category, reaching approximately 70% accuracy and ~57% macro F1, confirming model-learnability.
The taxonomy’s operational definitions, guiding questions, and example-based criteria are detailed in tabular form in the primary documentation.
4. Analytical Findings: Label, Highlight, and Explanation Alignment
- Label–Taxonomy Distributions:
- Distinct LiTEx categories co-occur with specific NLI labels: Logical Conflict is mainly associated with contradictions; Syntactic, Semantic, and Pragmatic with entailments.
- Span–Category Relationship:
- Syntactic explanations typically use longer highlights, reflecting span-level rephrasing, while WK explanations depend on minimal or no highlights, denoting reliance on non-local information.
- Explanation Similarity:
- Items with more diverse LiTEx category assignments exhibited lower pairwise similarity in explanation embeddings, directly quantifying within-label variation.
- Taxonomy Agreement vs. Label Agreement:
- Studies show that taxonomy (reasoning) agreement is a better predictor of semantic similarity between explanations than label agreement. Annotators may provide highly similar explanations with different labels (and vice versa) (Hong et al., 18 Oct 2025).
5. Implications for NLI Annotation and Explanation Generation
LiTEx surfaces two key implications:
- For Annotation Practice:
- Explanation-based annotation, leveraging the LiTEx taxonomy, provides a richer and more stable account of semantic decision-making than label-only protocols. This advocates for integrating free-text rationales and reasoning typologies into future NLI annotation schemes.
- Conditional Cohen’s κ analyses (see (Hong et al., 18 Oct 2025)) provide clear evidence that taxonomy agreement implies label agreement, but not the reverse—emphasizing that reasoning signal is conceptually prior.
- For Model Generation:
- Conditioning neural explanation generation on LiTEx categories yields outputs that are linguistically and semantically closer to human-produced explanations as compared to methods leveraging only labels or highlights (measured via POS n-gram overlap and embedding cosine similarity).
- This improvement in generation quality demonstrates the utility of reasoning-based taxonomy signals for building models that mimic the diversity and authenticity of human explanation.
6. Methodological and Theoretical Advances
- Taxonomy Construction: LiTEx builds on prior typologies (e.g., inferential class labels) but is unique in targeting free-text rationales post hoc, not merely NLI items or highlight spans.
- Annotation Analytics: By decomposing the annotation process into reasoning (taxonomy) and labeling, the approach supports finer-grained analyses of both agreement (where similar reasoning may lead to the same or different labels) and disagreement (which may reflect distinct or similar rationale categories).
- Formulas and Analysis:
- The Jaccard similarity for category alignment is used to quantify agreement: if , 0 otherwise.
- Conditional Cohen’s κ scores measure the degree to which agreement on taxonomy category implies agreement on labels and vice versa.
7. Prospects and Extensions
- Generalization: The taxonomy, developed for e-SNLI, has been applied to diverse NLI datasets such as LiveNLI and VariErr, establishing its adaptability and coverage (Hong et al., 18 Oct 2025). The framework is readily extendable to new domains, including broader natural language understanding and plausible reasoning tasks.
- Dataset Release and Downstream Use: Release of e-SNLI with LiTEx annotations and accompanying codebase supports further research in explanation-aware models, variation quantification, and robust evaluation protocols.
- Future Research Directions:
- Extending to multi-label categorization to account for explanations combining multiple reasoning strategies.
- Incorporating annotator background metadata to correlate individual explanation strategies with label choices and selection biases.
- Applying taxonomy-guided generation and variation-aware modeling in new NLI benchmarks.
Summary Table: LiTEx Category Taxonomy
| Super-Type | Subcategory | Key Reasoning Focus |
|---|---|---|
| Text-Based | Coreference | Referent resolution (entities, pronouns) |
| Syntactic | Sentence structural rearrangement | |
| Semantic | Lexical meaning: synonymy, antonymy, entailment | |
| Pragmatic | Implicature, presupposition, speaker meaning | |
| Absence of Mention | Missing/untested information in premise/hypothesis | |
| Logical Conflict | Contradictory logical structure or quantifier inference | |
| World Knowledge | Factual Knowledge | Consensus facts; encyclopedic or common-sense truth |
| Inferential Knowledge | Circumstantial, culture-dependent or causal inferences |
LiTEx thus provides a principled, validated framework for decomposing, analyzing, and generating NLI explanations, capturing the subtleties of within-label variation, and supporting advanced annotation and modeling methodologies in NLI research (Hong et al., 28 May 2025, Hong et al., 18 Oct 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free