Mean Dependency Distance (MDD) in Linguistics

Updated 21 January 2026

Mean Dependency Distance (MDD) is a metric that quantifies the average linear distance between heads and dependents in dependency trees, highlighting memory constraints in language processing.
Empirical studies show MDD typically ranges from 2 to 3 words across diverse languages, supporting theories of universal cognitive limitations on working memory.
Algorithmic approaches integrate MDD in parsing models and text simplification strategies, promoting efficient syntactic analysis and improved comprehension.

Mean Dependency Distance (MDD) is a central quantitative metric in syntactic theory and computational linguistics, measuring the average linear distance between heads and dependents in a sentence’s dependency parse. Empirical studies across typologically diverse languages consistently report MDD values in the range of approximately 2–3 words, a finding attributed to universal cognitive constraints. The minimization of dependency distance has deep implications for linguistic structure, parsing algorithms, text simplification, and theoretical models of language processing.

1. Formal Definition and Calculation

Mean Dependency Distance (MDD) quantifies word-order locality in a dependency tree. Given a sentence composed of $n$ words with a dependency parse yielding $(h \rightarrow d)$ arcs, where $h$ is the head word at position $i$ and $d$ is the dependent at position $j$ , the dependency distance for each arc is defined as $d_{ij} = |i - j|$ .

For a sentence with $N$ dependencies (typically $n-1$ for a single-rooted tree), the MDD is:

$\mathrm{MDD} = \frac{1}{N} \sum_{k=1}^N |h(k) - k|$

Corpus-level MDD can be reported as the mean of sentence-level MDDs or via pooled dependency distances divided by the total number of arcs (Gómez-Rodríguez, 2017, Ferrer-i-Cancho, 2013, Wang et al., 30 Apr 2025).

2. Empirical Distributions and Two-Regime Models

The distribution of dependency distances in natural languages is highly skewed, featuring a prominent peak at $(h \rightarrow d)$ 0 and a heavy tail for larger $(h \rightarrow d)$ 1 values. Recent research demonstrates that these empirical distributions across 20 languages are best fit by a double-exponential (two-regime) decay model:

$(h \rightarrow d)$ 2

with $(h \rightarrow d)$ 3 denoting a “chunk-size” break-point, empirically averaging 4–5 words. The steep “within-chunk” decay ( $(h \rightarrow d)$ 4) and gentler “cross-chunk” tail ( $(h \rightarrow d)$ 5) directly reflect short-term memory constraints. Observed mean dependency distances ( $(h \rightarrow d)$ 6) in these corpora range from $(h \rightarrow d)$ 7 (Finnish, Indonesian) to $(h \rightarrow d)$ 8 (Hindi) (Petrini et al., 2022). This cross-linguistic stability supports theories that memory limitations govern linear order, not language-specific syntax.

Language	$(h \rightarrow d)$ 9	$h$ 0	Observed MDD
English	0.60	6	2.53
Japanese	0.74	6	2.97
German	0.66	4	3.11

A plausible implication is that most dependencies occur within memory-accessible “chunks” and only a minority link across larger spans.

3. Structural Factors: Sentence Length, Predicate Valency, and Hubiness

Mean dependency distance increases monotonically with sentence length, but this growth is sublinear and governed by structural constraints. In Japanese, for instance, MDD grows from 1.00 (sentence length 2) to 2.97 (length 20), with no systematic change in the distribution’s shape as length increases (Wang et al., 30 Apr 2025). The threshold effect due to predicate valency—maximum number of direct dependents—imposes a formal trade-off between linear (MDD) and hierarchical (MHD) distances. When sentence length exceeds the predicate’s valency plus one, new nodes must attach at deeper levels, accelerating hierarchical span more than linear distance.

Hubiness, the variance of degree in a dependency tree, sets a lower bound on the minimal achievable MDD:

$h$ 1

High hubiness necessitates longer minimal dependency distances and thereby higher memory costs. Linear trees (paths) achieve the minimal MDD (1), while maximally “hubby” star trees have much larger minimal MDDs, scaling asymptotically as $h$ 2 (Ferrer-i-Cancho, 2013). This suggests natural languages avoid strong hubiness to maintain short dependencies.

4. Cognitive and Linguistic Implications

Low mean dependency distance is robustly associated with reduced working-memory load during syntactic processing. Futrell et al. (2015) report that dependency length minimization is a linguistic universal, with rare long or crossing dependencies being exceptions shaped by specific structures or fixed expressions. Lower MDD values in simplified texts (whether human-authored or generated by models) facilitate comprehension, particularly for less proficient readers (Lee et al., 2024).

The chunk-size break-point in double-exponential models aligns with short-term memory limits (“magical number” $h$ 3), indicating that dependency structures are fundamentally constrained by cognitive architecture rather than by local grammar (Petrini et al., 2022).

5. Algorithmic and Computational Perspectives

Dependency Distance Minimization (DDM) is a guiding principle in both natural language evolution and algorithmic parsing. Transition-based parsers (arc-standard, arc-eager, swap-based) inherently prefer short dependencies, yielding random parse trees with MDD values near or below those found in natural language corpora (Gómez-Rodríguez, 2017). Graph-based parsers can explicitly incorporate MDD, penalizing long arcs for both complexity reduction and accuracy gains.

Chunking further lowers MDD in simulated or random languages. Single-level linear chunking, when combined with projectivity, reduces MDD in random trees to match observed natural language levels. Optimal chunk sizes for minimizing MDD typically fall between 4–14 words; multi-level chunking is expected to lower MDD further, though not yet fully explored algorithmically (Lu et al., 2015).

6. Effects of Text Simplification and Cross-Linguistic Analysis

Empirical studies comparing original, human-simplified, and model-simplified sentences demonstrate systematic reductions in MDD with syntactic simplification. Human experts achieve the lowest MDD in rewritten sentences, followed by models such as ChatGPT, both outperforming the original texts. For a sample of 220 English Wikipedia sentences:

Sentence Type	MDD	SD
Original	2.90	0.63
ChatGPT-Simplified	2.87	0.62
Human-Simplified	2.77	0.60

The reduction is greater for human simplification ( $h$ 4), with model simplification offering a smaller decrease ( $h$ 5). These patterns signal that syntactic operations can further optimize linear locality beyond what is found in naturally occurring texts (Lee et al., 2024).

7. Theoretical and Modeling Limitations

Existing models primarily apply single-layer chunking and rely on random tree simulations, omitting the full range of grammatical category effects and multilayer hierarchical nesting. Future research areas include empirical derivation of chunk-size distributions, development of multi-level chunking algorithms, and integration of human processing experiments linking reaction times to MDD.

A plausible implication is that more fine-grained models factoring non-uniform, multi-tier chunking, and cross-linguistic predicate valency distributions will refine predictions for both MDD and the rarity of crossing dependencies (Lu et al., 2015, Wang et al., 30 Apr 2025).

Conclusion

Mean Dependency Distance is a fundamental metric connecting linguistic structure, algorithmic design, and cognitive processing. Its minimization in natural language and its robust empirical stability across major languages reflect universal pressures to economize working memory. Theoretical bounds, algorithmic convergence, and empirical modeling collectively demonstrate that MDD provides a precise lens on the linear organization of syntax, memory-efficient parsing strategies, and the cognitive architecture of language comprehension.