Idiom Class Taxonomy Overview

Updated 14 June 2026

Idiom Class Taxonomy is a structured schema categorizing idiomatic and figurative multiword expressions based on semantic, syntactic, pragmatic, and cultural parameters.
Modern taxonomies integrate linguistically motivated classes, empirical annotation protocols, and computational features to automate idiom detection and classification.
The taxonomy underpins advanced NLP tasks including machine translation and word sense disambiguation by providing empirically validated benchmarks and multidimensional annotation.

An idiom class taxonomy provides a structured schema for categorizing idiomatic and figurative multiword expressions (MWEs) according to salient semantic, syntactic, pragmatic, and cultural parameters. Modern taxonomies combine linguistically motivated class systems, empirically motivated annotation protocols, and computationally tractable features. Recent work systematically identifies idiom classes for English and other languages; alternative taxonomies leverage formal quantitative axes, multidimensional annotation fields, and operationalized heuristics for automated detection.

1. Major Idiom Class Taxonomies

Idiomatic expressions resist compositional interpretation, often involving figurative meaning, syntactic fixedness, and cultural embedding. The PIE-English corpus articulates a 10-class, flat-structure taxonomy rooted in the traditions of Alm-Arvius (2003) and Lakoff & Johnson (2008) (Adewumi et al., 2021):

Class	Defining Feature	Example MWEs
Metaphor	Implicit cross-domain mapping; no marker	“ring a bell”, “see the light”
Simile	Explicit comparison; “like” / “as”	“as clear as a bell”
Euphemism	Mild/indirect term for taboo/harsh referent	“kick the bucket”
Parallelism	Repetition of structure, isomorphism	“day in, day out”
Personification	Human properties assigned to non-human	“wind whispered through trees”
Oxymoron	Juxtaposed antonyms	“deafening silence”
Paradox	Statement is self-contradictory or counter-logical	“I know one thing: I know nothing”
Hyperbole	Exaggeration	“a million times”
Irony	Utterance with the opposite intended meaning	“Great—another rainy day”
Literal	Straightforward compositional usage	“He placed the ring on her finger”

PIE-English uses strictly non-hierarchical labeling, constraining each instance to a single mutually exclusive class. The decision rules prioritize explicit markers (e.g., simile via “like/as”), semantic substitution (euphemism), human attributes to non-humans (personification), structural/formal features (parallelism), and contextual oppositionality (irony). Classes such as paradox and oxymoron are recognized by logical or antonymic contradiction. The “literal” class serves as a non-figurative control (Adewumi et al., 2021).

Bengali idioms, by contrast, are annotated with a semantically uniform schema comprising 19 fields, encompassing semantic, syntactic, cultural, and religious features. No subcategorization into metaphor vs. simile is present; all idioms are structurally parallel at the taxonomy level (Sakhawat et al., 13 Feb 2026).

2. Formal Quantitative Axes

Socolof et al. propose that idioms are best modeled as the intersection of two independent, continuous dimensions: conventionality and contingency (Socolof et al., 2021).

Conventionality (conv): Quantifies semantic deviation of constituent content words in phrase-internal contexts, operationalized as the negated mean standardized Euclidean distance between in-idiom BERT embeddings and the “off-idiom” embedding cloud for each word:

$\text{conv}(\text{phrase}) = -\frac{1}{m}\sum_{j=1}^m \left\| \frac{T_j - \mu_O}{\sigma_O} \right\|_2$

Lower conv values signify greater semantic shift from canonical usage.

Contingency (cont): Generalizes pointwise mutual information to multiword sequences, capturing the degree to which the observed co-occurrence probability exceeds chance, estimated using XLNet LLM probabilities and the chain rule:

$\text{cont}(x_1,\ldots,x_n) = \log \frac{p(x_1,\ldots,x_n)}{\prod_{i=1}^n p(x_i)}$

Higher cont scores indicate stronger lexical binding.

Empirically, idioms cluster in the low-conventionality (semantically opaque), high-contingency (collocationally strong) quadrant, while regular collocations are high-conventionality, low-contingency. These measures are not statistically correlated (Pearson $r = -0.037$ , $p = 0.518$ ). Table summaries and figure plots in (Socolof et al., 2021) provide prototypical phrase positions.

3. Annotation Protocols and Dataset Construction

The PIE-English taxonomy is implemented in a corpus of 20,174 English multiword expressions, annotated by two independent raters using 10 mutually exclusive classes. Annotators follow an explicit decision tree: simile (if “like/as”), euphemism (taboo substitution), personification (humanization of non-human), oxymoron (antonym juxtaposition), paradox (self-contradiction), hyperbole (exaggeration), irony (contextual inversion), literal (compositional), metaphor (default cross-domain mapping), parallelism (structural repeat). Disagreements (11.11%) are resolved by deferring to the more specific Alm-Arvius class (Adewumi et al., 2021).

In the Bengali schema, annotation is performed by expert consensus using a 19-field grid: semantic (e.g., literal/figurative meanings in both Bangla and English, tags, frequency, sentiment), syntactic/pragmatic (example sentences, usage domains), cultural/diachronic (significance, provenance, spatial distribution), and religious dimensions. Several fields are conditionally non-empty (e.g., etymological notes only if historical_significance is true). No class subtypes are mandated; each idiom adopts the same metadata structure (Sakhawat et al., 13 Feb 2026).

4. Empirical Insights and Performance Benchmarks

For PIE-English, per-class inter-annotator agreement (IAA) varies widely, from high in major classes (Euphemism 76.94%, Literal, Metaphor 73.27%) to low for rare rhetorical types (Personification, Simile <6%) (Adewumi et al., 2021). BERT classifiers, fine-tuned for 10-way idiom class prediction, achieve an overall accuracy of 93.4% with weighted-average F₁ of 0.948; F₁ exceeds 0.9 for Metaphor, Simile, Euphemism, but drops to near zero for low-frequency classes (Hyperbole, Oxymoron, Irony). The confusion matrix exposes class bleed between Metaphor, Literal, and Euphemism.

In Bengali, annotator consensus under the 19-field schema underpins a 10,361-idiom dataset. On figurative sense inference, no LLM surpasses 50% accuracy; human annotators score 83.4%. Graded LLM scoring uses a 6-point comprehension rubric, from 0 (total misunderstanding) to 5 (complete, precise figurative meaning). This result highlights model deficits in cultural/semantic mapping and low-resource settings (Sakhawat et al., 13 Feb 2026).

5. Illustrative Examples and Class Boundaries

Quantitative and categorical taxonomies both elucidate idioms’ functional range:

Idioms (low conv, high cont): “red tape” (cont ≈ 3.2 bits, conv ≈ –1.8), “bread and butter” (cont ≈ 3.5, conv ≈ –1.6)
Collocations (high cont, zero conv): “more or less” (cont ≈ 2.8, conv ≈ 0.1)
Literal/Compositional: “blue sky” (cont ≈ 0.5, conv ≈ 1.2)
Rhetorical figures in PIE-English: “I know one thing: that I know nothing” (Paradox); “A voice came from the back of beyond” (Hyperbole)
Culture-specific idioms (Bengali): “মাথা খারাপ” [matʰa kharap], literal—“the brain is bad”; figurative—“to go crazy” [“to lose one’s mind”]; tags: “sanity”, “mental-state”; usage: common, negative sentiment (Sakhawat et al., 13 Feb 2026).

Such examples demonstrate the gradient, intersecting nature of idiomaticity: idioms shade into collocations, metaphors, or rare compositional forms, resisting simple binary classification (Socolof et al., 2021).

6. Theoretical and Practical Ramifications

The existence of gradient, multidimensional idiom class taxonomies substantiates that idiomaticity cannot be reduced to “idiom lists” or isolated exception handling. The empirical independence of conventionality and contingency implies that idiom detection, metaphor recognition, and collocation analysis can jointly be achieved via general-purpose LLMs using the proper abstractions—no additional idiom-specific machinery is needed (Socolof et al., 2021). Flat class schemas, as in PIE-English, facilitate supervised classification and downstream NLP tasks; comprehensive multidimensional annotation (as in Bengali) supports benchmarking in culturally grounded figurativity (Sakhawat et al., 13 Feb 2026).

For natural language processing, idiom class taxonomies are now instrumental in machine translation, word sense disambiguation, and information retrieval, enabling new evaluation standards for both high- and low-resource languages. Directions for future schema extension include adding dialectal/generational features, fine-grained cultural metrics, and cross-lingual alignment fields to enable richer contrastive and transfer learning paradigms.

7. Cross-Linguistic Generalization and Extension Strategies

Structurally, the idiom class taxonomy can be expanded for resource-poor and typologically diverse languages by introducing:

Dialectal features: New fields for dialect or regional variants.
Generational/demographic metadata: Capturing usage evolution.
Community-sourced scaling: Semi-supervised or consensus-driven growth.
Gradated significance metrics: Replacing binary cultural/religious significance with continuous scores.
Benchmark integrations: Detection, interpretation, and generation tasks built around these schemas (Sakhawat et al., 13 Feb 2026).

This multidimensional, methodologically integrated approach enables the construction of idiom class taxonomies as robust foundations for computational and linguistic inquiry, supporting both theoretical generalization and practical NLP development.