Morphology-Sensitive Representations Overview

Updated 3 April 2026

Morphology-sensitive representations are models that encode morphological structure by mapping inflected and derived forms to related regions in a shared feature space.
They leverage techniques such as character n-gram compositionality, structured embedding synthesis, and cross-modal contrastive alignment to capture morpho-linguistic patterns.
Empirical studies show these representations boost generalization and interpretability, improving performance across language understanding, speech recognition, and bioinformatics applications.

Morphology-sensitive representations encode, amplify, or otherwise systematically reflect morphological structure—inflectional, derivational, or combinatory—in the learned feature space of a model. These representations are critical in diverse fields including spoken language processing, natural language understanding, image–text alignment, and bioinformatics, where capturing the signal of form–meaning mapping or morphological phenotype enables generalization, robustness, and interpretability that purely distributional or surface-level methods miss.

1. Foundations and Mathematical Formalism

Morphology-sensitive representations are those which encode morphological structure—such as inflectional or derivational relations—so that morphologically related forms are mapped to nearby or related regions in a feature space, and transformations between forms (e.g., “walk”→“walked”, “sing”→“singer”) correspond to consistent directions, subspaces, or partitions.

In self-supervised speech recognition, these representations emerge in word-level linear translation subspaces derived from frame-level acoustic states, e.g., $z_\ell(t) = W_z x_\ell(t)$ , where $x_\ell(t)\in\mathbb{R}^{768}$ is a Wav2Vec hidden state and $W_z$ is a 32×768 word-probe matrix. Morphological relations are captured by difference vectors $\Delta_m = v(\mathrm{inflected}) - v(\mathrm{base})$ , which form tightly clustered, highly linear translation clouds in the embedding space (Gauthier et al., 26 Sep 2025).

In neural LLMs, morphology-sensitive structure manifests as convex regions partitioned by feature values (e.g., Number, Case) in BERT-style encoder spaces, linearly decodable transformation subspaces in decoder architectures, or explicit analogical translation matrices extracted by averaging Jacobians across subject–object relation pairs (Edmiston, 2020, Xia et al., 19 Jul 2025).

In spatial omics, representations are formalized at the level of cross-modal embedding functions, with translation ( $\max_{\theta_M} I(h_G; h_M)$ ) and integration ( $\min_{\theta_M} I(h_G; h_M)$ ) objectives quantifying mutual information between gene expression and morphology-derived image features (Chelebian et al., 2024).

2. Methodologies for Learning Morphology-Sensitive Representations

Multiple architectural and training paradigms induce morphology-aware structure:

A. Subword and Character-Level Compositionality: Character $n$ -grams (especially trigrams) composed with bi-LSTMs consistently capture affixation and concatenative morphology, outperforming word-only models and even subword units such as BPE or Morfessor morphs under most typologies (Vania et al., 2017, Vylomova et al., 2016). Character-level transformers operating over fixed-length sequences have been shown to robustly encode morphological patterns, especially when trained with both character-level masked language modeling and auxiliary multi-label dialect or morph tag objectives (Aries, 1 Sep 2025).

B. Surface/Lemma/Tag Synthesis: Structured summation of embeddings for surface form, lemma, and morphological tag allows explicit control over the trade-off between semantic and morphological similarity in word representations (Avraham et al., 2017). Including only lemma boosts semantic similarity, while adding full tags maximizes morpho-similarity, sometimes yielding perfect agreement for rare words.

C. Probing, Linearization, and Analogy Structure: Probing layers or explicit analogy schemes reveal highly linear translation subspaces for regular inflections in both acoustic and text models (Gauthier et al., 26 Sep 2025, Xia et al., 19 Jul 2025). In LLMs, the mean Jacobian over subject–object pairs defines a relational matrix $W_r$ such that $W_r s \approx o$ (faithfulness $\sim$ 0.90 for morphology), providing a sparse, interpretable map for morphological transformations.

D. Semi-Supervised and Rule-Based Specialization: Injecting morphological tags into supervision targets, or fine-tuning embeddings with “attract–repel” constraints based on inflectional synonymy and derivational antonymy, can force morphologically related forms together and antonymic forms apart, aiding low-frequency words and rare inflections (Cotterell et al., 2019, Vulić et al., 2017).

E. Cross-Modal Morphology: In multi-modal settings, e.g., ECG/echo or spatial omics, morphology-sensitive embeddings are enforced by cross-modal contrastive alignment, forcing features in one modality (ECG, image) to be predictive of, or complementary to, morphological structure in another (echoes, gene expression) (Liman et al., 9 Mar 2026, Chelebian et al., 2024).

3. Morphology-Sensitivity in Deep Language and Speech Models

A key achievement in recent research is the empirical and formal demonstration that large self-supervised models for language and speech often learn a global, approximately linear structure encoding morphological relations:

In S3M-based speech models, the projection into a word-optimized subspace yields translation vectors $x_\ell(t)\in\mathbb{R}^{768}$ 0 that cluster tightly and are largely insensitive to explicit affix, allomorph, or morphological category distinctions—high analogy accuracy persists even for cross-category and lexical “false friend” pairs (Gauthier et al., 26 Sep 2025).
BERT-style transformers carve embedding space into convex subregions for each morphological feature value. Linear probes regularly attain F1 > 0.95 for unambiguous forms; syncretic forms are effectively disambiguated in middle layers by contextualization (Edmiston, 2020).
In decoder LMs, a single cross-layer relational matrix learned from the model’s internal Jacobian can generate the hidden states of inflected forms with high faithfulness (∼90%) for morph relations across languages (Xia et al., 19 Jul 2025).

This emergent geometry supports analogical inference and robust generalization beyond supervised exposure to specific inflections.

4. Evaluation, Quantitative Results, and Cross-Task Impact

Morphological sensitivity is established by both intrinsic and extrinsic metrics:

Analogy Tasks: Mean rank in analogy—how often $x_\ell(t)\in\mathbb{R}^{768}$ 1 retrieves the true $x_\ell(t)\in\mathbb{R}^{768}$ 2—shows sharp clustering for morphologically regular pairs (mean rank ∼ 1–8 for nouns/verbs in (Gauthier et al., 26 Sep 2025); linear LRE achieves up to 90% matching in LM decoding (Xia et al., 19 Jul 2025)).
Tag Probes and MorphoSim: k-NN morphological tagging accuracy, average cosine similarity (ACS) between root–derived pairs, and distance metrics over annotated tag features robustly establish model capacity to encode inflectional or derivational affinity (Aries, 1 Sep 2025, Cotterell et al., 2019, Avraham et al., 2017).
Downstream Tasks: Morph-sensitive embeddings yield substantial improvements in practical applications, e.g., BLEU gains (+1–1.5) in machine translation for morph-rich languages when subword/char-based models are used (Vylomova et al., 2016), or substantial accuracy improvements in dialogue state tracking when morph-fitted embeddings are substituted (German WOZ-2.0: 60.6→66.3%) (Vulić et al., 2017).

A summary table of comparative morphology-aware tagging results:

Model	Tagging (%)	ACS (Deriv.)	Language (Arabic/English)
DziriBERT_tok	83.6/94.4	0.785/0.725	(Aries, 1 Sep 2025)
CANINE-C_cls	92.9/94.2	0.754/0.740
chDzDT_5×4×128	95.3/95.5	0.922/0.751

Morphology-sensitive representations have been generalized beyond classical text and speech contexts:

Bioinformatics/Spatial Omics: In tissue analysis, morphology representations are categorized as translation (predicting gene expression from image morphology) or integration (complementing gene expression to define spatial domains), exploiting mutual-information-based objectives and fusion architectures (Chelebian et al., 2024).
Cardiac Imaging: Echo2ECG aligns ECG representations to multi-view echo-derived embeddings via a CLIP-style contrastive objective, directly injecting cardiac morphological variation (e.g., LVEF, chamber sizes) into ECG features to enable phenotype classification and cross-modal retrieval (Precision@1 = 0.387–0.517 for various structural traits) (Liman et al., 9 Mar 2026).
Image–LLMs: Scenario Refiner probes whether vision+LLMs distinctively ground morphological contrasts (runner/running) in image–text matching, revealing persistent grammatical (nominal) bias and a discrepancy relative to human morphological sensitivity (Tagliaferri et al., 2023).
Astronomy: In HI galaxy surveys, morphology is distilled into compact vectors ([C, A, S, G, M20, E]) capturing structural features that are robust to wavelength, directly supporting ML-based merger identification at high precision (Holwerda et al., 2011).

6. Theoretical Implications and Limitations

The empirical findings challenge classical psycholinguistic modularity, supporting the view that (i) large self-supervised models re-discover global translation-like morphological operators in absence of explicit symbolic supervision (Gauthier et al., 26 Sep 2025), and (ii) such operators often cut across morphophonological categories, tracking distributional regularities rather than discrete symbolic units.

However, there is evidence of a persistent gap between best unsupervised/subword-based models and models with access to gold morphological analyses; semantic and morphological similarity cannot be maximized simultaneously via undifferentiated composition (Avraham et al., 2017, Vania et al., 2017).

Further, misalignment remains a challenge in cross-modal or emerging-dialect domains: pure distributional or dual-encoder models (e.g., CLIP) fail to match human flexibility in morphological grounding, necessitating task-specific datasets and architectures (Tagliaferri et al., 2023).

7. Best Practices and Open Challenges

Empirically validated principles for constructing morphology-sensitive representations include:

Prefer character $x_\ell(t)\in\mathbb{R}^{768}$ 3-gram + contextual composition (bi-LSTM/CNN/transformer) for maximally sensitive, robust word representations, especially in morph-rich or noisy-dialect settings.
Inject morphological tags, either via explicit multi-task objectives or rule-based “morph-fitting,” whenever labeled data are available; even small amounts of annotation sharply enhance representation quality (Cotterell et al., 2019, Vulić et al., 2017).
For cross-modal applications, optimize mutual information-based objectives and design contrastive or fusion architectures appropriate to the translation–integration task (gene expression, cardiac phenotype).
Explicitly disentangle semantic and morphological similarity through careful composition of surface, lemma, and tag embeddings, selecting for the application’s primary objective (Avraham et al., 2017).
Validate morphology-sensitivity using analogical translation, tag-prediction probes, and intrinsic/extrinsic measures alongside standard accuracy or retrieval scores.

Open directions include bridging the remaining gap to human-level morphological generalization, developing architectures that seamlessly integrate concatenative and non-concatenative morphology, and deploying these insights in under-resourced languages and challenging multimodal environments. Task-specific, morphologically minimal-pair datasets and context-aware integration of morphological cues during model pretraining and finetuning are especially promising avenues for future research.