Morphological Addressing: Insights & Applications
- Morphological Addressing is the structuring of models around interpretable units like morphemes, affixes, or physical forms, enhancing both generalization and interpretability.
- It applies across diverse domains such as word embeddings (e.g., MSG), rule-based language generation for low-resource languages, and robust neuro-robotic control through domain randomization.
- The approach also enables fine-grained control in generative diffusion models via compositional prompts and learned latent directions, improving output diversity and stability.
Morphological addressing refers to the strategy of structuring, conditioning, or manipulating representations and learning pipelines around the discrete, interpretable building blocks of morphological structure—whether in language (morphemes, inflectional features), neuro-robotic controllers (physical body shapes), or generative models (visual or scriptive descriptors). Rather than treating surface forms or latent factors as atomic or opaque, morphological addressing systematically encodes or responds to the underlying constitutive units that generate variation, enhancing generalization, tractability, and interpretability across domains such as word representation, morphology generation, control under morphological variation, and prompt-based generative modeling.
1. Morphological Addressing in Word Representation
Santos et al. formalize morphological addressing within word embedding learning via the Morphological Skip-Gram (MSG) model (Santos et al., 2020), which replaces FastText’s character n-gram address space with a bag of linguistically valid morphemes per word. Each word is segmented into its set of morphemes (e.g., base, affix, gender/number ending, or thematic vowel) via unsupervised statistical segmentation (Morfessor 2.0). Instead of summing embeddings of all character n-grams (many of which are spurious), MSG constructs the word representation
where is the base word vector and are morpheme embeddings. Training proceeds analogously to classical Skip-Gram with negative sampling, but per-update cost is reduced due to the smaller, more meaningful morpheme vocabulary (where is the set of all character n-grams).
Intrinsic evaluation (analogy, similarity, categorization tasks) on benchmarks including SimLex-999 and Google Analogy shows MSG outperforms FastText on analogy and matches similarity/categorization, while reducing training time by approximately 40%. The morphological address base enables experts to inject linguistic knowledge and reduces noise from non-semantic character substrings, particularly benefiting morphologically rich languages.
| Model | Analogy Acc. (Google/MSR/SemEval) | Similarity (SimLex-999) | Training Speed |
|---|---|---|---|
| FastText | 0.128/0.082/0.168 | 0.34 | Baseline |
| MSG | 0.332/0.211/0.180 | 0.35 | +40% faster |
2. Rule-Based Morphological Addressing for Language Generation
Gebremariam et al. apply morphological addressing to the generation of surface forms in Ge’ez, a morphologically complex, low-resource language (Gebremariam et al., 24 Sep 2025). The synthesizer is based on a rule-based Two-Level Morphology architecture:
- Lexicon and stem-classifier modules select the appropriate stem (including regular/irregular class handling).
- Signature builder determines all valid affix paradigms for tense-aspect-mood (TAM), subject- and object-markers.
- Boundary-rule handler applies required orthographic and phonological alternations at morpheme concatenation.
- Two-level finite-state constraints map stem+affix sequences (“lexical” forms) to final surface representations.
Formally, a surface form is generated as
where is the derived stem, and are affixations determined by feature specifications , and applies the appropriate morphophonological boundary rules.
This morphological addressing framework achieves 97.4% generation accuracy across 26,867 Ge’ez verb forms, including both regular and irregular verbs, by systematically encoding paradigm structure and alternation patterns. Resource efficiency is ensured by reliance on a compact lexicon and expert rule tables rather than large annotated corpora.
3. Morphological Addressing in Contextual Morphological Inflection
Vylomova et al. demonstrate a hybrid graphical model for sentence-level inflection (Vylomova et al., 2019). The model addresses the morphological space via an explicit latent tag sequence (e.g., POS, number, tense) inferred from the lemmatized input :
where is parameterized as a BiLSTM–CRF and as a hard-attention character-level encoder-decoder. Morphological features are “addressed” as explicit, contextually inferred variables, decoupling selection (“which features?”) from realization (“how to spell them?”).
On typologically diverse UD languages, this approach yields significant performance improvements over direct lemma→form models, especially when morphological features cannot be trivially determined from surface context. Explicit morphological addressing enables the system to model agreement, inherent features, and non-local dependencies with interpretable intermediate variables.
4. Morphological Addressing Under Morphological Variation in Control Systems
In neuro-evolutionary control, Triebold & Yaman introduce implicit morphological addressing via domain-randomized training over multiple morphologies without explicit access to shape parameters at test time (Triebold et al., 2023). Controllers (fully connected ANNs) are evaluated across a “learning set” of morphological variants, with optimization proceeding by natural evolutionary strategies and specialized evolutionary branching.
The resultant controller “addresses” the entire morphology space by encoding policies that generalize across , with fitness measured as average (or local minimum) performance over all morphologies: Controllers never observe explicit morphology at run-time; robustness emerges from weight-space adaptation during training. If no single generalist suffices, evolutionary branching splits into locally coherent subsets, yielding a small ensemble of local experts. This approach operationalizes morphological addressing as “systematic exposure to morphology variation during optimization,” providing Pareto-like trade-offs between specialization and generalization.
5. Morphological Addressing in Generative Diffusion Models
In text-to-image diffusion, morphological addressing is realized via systematic manipulation of prompt structure and LoRA-based latent direction learning (Fraser, 20 Feb 2026). Two forms are demonstrated:
- Training-level morphological addressing: Descriptor prompts composed of visual constituent features (e.g., “platinum blonde,” “beauty mark”) create a navigable gradient through latent space toward a specific identity basin (e.g., Marilyn Monroe), even without explicit naming or images. A self-distillation process refines LoRA adapters across rounds, yielding high-ArcFace similarity clustering and allowing systematic navigation along and away from the target identity axis.
- Prompt-level morphological addressing: Prompt construction based on phonestheme theory (e.g., using “cr-,” “sn-,” “-oid”) produces novel, visually coherent image clusters (Purity@1=1.0 for several candidates) with no lexical referent, exploiting sound-symbolic structures internalized by CLIP-based tokenizers.
Directional “morphological pressure” applied via compositional prompts or LoRA results in phase transitions between high-density basins of identity, and establishes local coordinate axes for deliberate navigation (target/anti-target) through generation space.
6. Implications, Applications, and Future Directions
Morphological addressing facilitates:
- More interpretable, compositional, and efficient modeling in morphologically rich domains, benefiting both data-driven (MSG, hybrid neural) and rule-based (Ge’ez synthesis) systems.
- Generalization and robust adaptation in neuro-robotic controllers exposed to domain randomization without need for morphology sensing.
- Fine-grained, attribute-driven control in generative models, enhancing sample diversity and identity stability through carefully engineered prompt morphologies or learned latent directions.
Future extensions include joint segmentation/embedding learning, adaptation to architectures beyond Skip-Gram (e.g., GloVe, CBOW), multi-objective optimization balancing generalization/specialization, and richer integration of linguistic or geometric annotations. Morphological addressing, by aligning modeling pipelines with the true generative structure of their domains, offers a principle for scalable, robust, and interpretable representation learning across language, vision, and control.