Towards a theory of morphology-driven marking in the lexicon: The case of the state

Published 3 Apr 2026 in cs.CL | (2604.03422v1)

Abstract: All languages have a noun category, but its realisation varies considerably. Depending on the language, semantic and/or morphosyntactic differences may be more or less pronounced. This paper explores these variations, using Riffian as a reference point before extending the analysis to other languages. We propose a formal model termed morphology-driven marking. Nouns are organised into modular cognitive sets, each with its own morphological template and unmarked form. This approach helps explain differences in marking among noun types within and across languages. By situating these patterns within syntactic functions, we also reassess the notions of markedness and state. It is proposed that the concept of state be extended to all synthetic languages and analysed a novel subcategory of syntax-based inflection like agreement and grammatical case.

Abstract PDF Upgrade to Chat

Authors (1)

Mohamed El Idrissi

Summary

The paper proposes a formal, template-based model of morphology-driven marking that distinguishes state from case.
It employs bidirectional 'push' and 'pull' processes to optimize feature–form mappings and reduce lexical redundancy.
Cross-linguistic analysis, including Riffian Berber, demonstrates the model’s ability to account for variation in nominal marking.

A Formalist Account of Morphology-Driven Marking: Theoretical Integration and Cross-Linguistic Evidence

Introduction

The paper "Towards a theory of morphology-driven marking in the lexicon: The case of the state" (2604.03422) presents a formal, template-based model of morphological marking in the nominal domain, using Riffian Berber as a principal case study and extending the analysis to a broad cross-linguistic perspective. The core proposal is a morphology-driven marking mechanism that eschews purely semantic or language-specific explanations in favor of a modular, cognitive model wherein noun subcategories instantiate distinct templates with unique unmarked forms. This system allows the authors to articulate a precise, generalized definition of "state" as a subcategory of syntax-driven inflection, separating it from traditional case-marking and offering a systematic account of variation in marking across and within languages.

Critique of Existing Morphological Theory

The study situates itself within debates on the representation and generation of morphological forms, critiquing both classical generative and Distributed Morphology (DM) accounts for redundancies and over-reliance on directed, unidirectional processing between cognitive components. Instead, the author advances a functional, graph-theoretic model with bidirectional ("push" and "pull") processes mediating between morpho-semantic and morpho-phonological layers. The core mechanism is bidirectionality in feature–form mapping: "pull" processes select unmarked forms in the absence of explicit selection, while "push" processes produce marked forms, whether due to conscious, speaker-driven intention or morphosyntactic constraint.

This architecture is argued to increase parsimony in the representation of lexical information, providing a computational framework for optimal morpheme selection via minimal symmetric difference between sets of semantic features and template slots. The author reconceptualizes conversion and derivation as item-cloning operations governed by set-based feature matching, thereby circumventing the need for extensive lists or rules as in DM.

Riffian Morphology in Typological Context

Riffian is analyzed as a paradigmatic synthetic and fusional language, with a canonical nominal structure incorporating countability, radical, gender, and number. The discussion highlights that countability is not universally encoded, and that in Riffian, it manifests primarily through the contrast between a null-marked collective and a formed singulative, both of which interact with other inflectional features such as gender and number. The paper draws explicit contrast here with languages like English and French, whose definiteness and collectivity markers function differently and may trigger in-situ or inter-word variations.

A key theoretical innovation is the formal dissociation of "state" from "case" as an inflectional dimension that modulates the form of countability markers—exemplified by the alternation between free state (FS) and annexation state (AS) in specific syntactic environments in Riffian. The state is thus defined as a morphosyntactic dependency, often realized as the alternation or suppression of certain markers in predefined syntactic contexts, and is formalized as a function from contexts and templates to output templates.

Theoretical Synthesis: Markedness, Formedness, and the State

The author rigorously distinguishes between markedness (an item-based, pairwise form–process relation), formedness (presence of a segmentally expressed marker), and the cross-linguistically variable status of bare forms. Markedness is defined as an interaction between forms instantiated via specific formation processes (conversion, inflection, derivation), and the analysis is expanded to dissociate inflectional from derivational markedness—even within the same nominal paradigm.

A crucial claim is that lexical subcategories (cognitive sets) each instantiate their own morphological templates and unmarked forms, and that markedness must be determined at the level of these templates rather than surface semantics or frequency. This allows the theory to account for both the internal modularity of inflectional classes and the observed typological variation, such as languages where proper nouns (PN) and common nouns (CN) share or diverge in their use of definiteness or collectivity markers.

The paper introduces the interaction of intrinsic and extrinsic markers, showing that only the former (e.g., gender, number, countability, sometimes definiteness) are relevant for markedness as they are included in the nominal template. Extrinsic markers (some determiners) trigger surface alternations but are not encoded in template-based markedness computations. Borrowing and conversion are unified as processes duplicating forms together with their intrinsic markers, explaining why certain articles—when borrowed with radical forms—remain robustly attached in the lexicon.

Bidirectionality is also linked to lexicon–syntax interaction. Crucially, the "pull" process manages unmarked inflection via cognitive automation, while the "push" process, which can override the pull, generates marked forms either by intention (speaker control) or by syntactic constraint, which is formalized in terms of weighted edge-graphs over the cognitive matrices proposed.

Cross-Linguistic Application and Implications

The theory is extended typologically to languages such as English, French, Greek, Tagalog, and Wolof, demonstrating that in-situ, state-induced alternations in noun phrase marking (e.g., absence of definite articles in instrumentals and genitives: "by ∅ boat" vs. "the boat") are instantiated within the proposed model as push–pull interactions between templates and context-sensitive state functions.

The author claims that semantically-motivated traditions (i.e., treating proper nouns as inherently definite or collective) are insufficient to account for the irregularities observed and that the proposed morphology-driven model generalizes across diverse inflectional systems. A formal, set-theoretic definition of "state" is given, allowing for systematic prediction of forms generated by context–template interaction in any synthetic language, with concrete illustrations from Riffian and French. The modular approach to markedness, formedness, and state re-contextualizes the usage of bare nouns, collective markers, and definite articles as outputs of template competition mediated by state processes, not as side-effects of underlying semantic values.

Notably, the theory explains the prevalence—and limits—of inter-word and in-situ alternations, the cross-linguistic non-co-occurrence of certain intrinsic markers (e.g., definite and collective), and why some languages permit post-nominal stacking of articles and demonstratives while others prohibit it via morphological or semantic undeterminedness.

Implications and Future Directions

The model invites a systematic reanalysis of typological data regarding nominal marking. Its practical implications are significant for computational linguistics, particularly in the modeling of morphological alternations, template-based lexicon construction, and the automatic identification of markedness shifts (e.g., in conversion or borrowing). The theoretical ramifications are equally pronounced—it strengthens the argument for cognitive modularity, introduces a formal structure for cross-linguistic morphological templates, and analytically reconciles conversion, derivation, and inflection as unified operations subject to computational optimization. The redefinition of "state" as a generalized inflectional mechanism—rather than as a case or article phenomenon—has the power to streamline cross-linguistic grammatical descriptions.

Future developments in AI and computational models of language stand to benefit from the formal cognitive-morphological architecture proposed, especially in language processing systems that require principled, template-driven mappings from abstract features to realized forms. The separation of state and case marking, the formal bidirectionality of morphological processes, and the use of feature-set similarity as selection criteria, all offer avenues for robust, scalable implementation in morphological parsers and generators.

Conclusion

The paper provides a detailed and methodologically rigorous account of morphology-driven nominal marking, introducing powerful formal distinctions between markedness, formedness, and state, all grounded in modular cognitive templates. Through precise typological and formal analysis, the theory dissolves semantically-anchored inconsistencies in the treatment of noun categories, definiteness, and collectivity. The implications for morphological theory, linguistic typology, and computational modeling are substantive, establishing a foundation for future formal generalizations and systems development grounded in the formal, cognitive modularity of nominal marking (2604.03422).

Markdown Report Issue