Papers
Topics
Authors
Recent
Search
2000 character limit reached

Word length predicts word order: "Min-max"-ing drives language evolution

Published 20 May 2025 in cs.CL | (2505.13913v1)

Abstract: Current theories of language propose an innate (Baker 2001; Chomsky 1981) or a functional (Greenberg 1963; Dryer 2007; Hawkins 2014) origin for the surface structures (i.e. word order) that we observe in languages of the world, while evolutionary modeling (Dunn et al. 2011) suggests that descent is the primary factor influencing such patterns. Although there are hypotheses for word order change from both innate and usage-based perspectives for specific languages and families, there are key disagreements between the two major proposals for mechanisms that drive the evolution of language more broadly (Wasow 2002; Levy 2008). This paper proposes a universal underlying mechanism for word order change based on a large tagged parallel dataset of over 1,500 languages representing 133 language families and 111 isolates. Results indicate that word class length is significantly correlated with word order crosslinguistically, but not in a straightforward manner, partially supporting opposing theories of processing, while at the same time predicting historical word order change in two different phylogenetic lines and explaining more variance than descent or language area in regression models. Such findings suggest an integrated "Min-Max" theory of language evolution driven by competing pressures of processing and information structure, aligning with recent efficiency-oriented (Levshina 2023) and information-theoretic proposals (Zaslavsky 2020; Tucker et al. 2025).

Authors (1)

Summary

  • The paper introduces a Min-Max theory showing that word length significantly predicts word order change across over 1,500 languages.
  • It uses hierarchical regression models to reveal that noun and verb lengths explain more variance in word order than language family or area.
  • Findings suggest a balance between processing efficiency and communicative complexity with implications for historical language evolution.

Word Length as a Predictor of Word Order

This paper introduces a "Min-Max" theory of language evolution, positing that word order change is driven by the competing pressures of processing efficiency and information structure. The study leverages a large, tagged parallel dataset of over 1,500 languages to investigate the correlation between word class length and word order, revealing that word length is significantly correlated with word order crosslinguistically. The paper claims to predict historical word order change in two different phylogenetic lines and explains more variance than descent or language area in regression models.

Complexity, Processing, and Word Order

The paper acknowledges the multifaceted nature of linguistic complexity, noting the challenge of disentangling theoretical considerations from language processing. It discusses how word length serves as a measure of word complexity, impacting word order due to its influence on processing. The paper addresses the debate between theories emphasizing minimization of complexity for efficient processing and those suggesting maximization of complexity for enhanced information content. The performance-grammar correspondence hypothesis (PGCH) posits that speakers minimize complexity to expedite processing, aligning with observations on constituent weight and dependency length minimization. Conversely, surprisal theory proposes that speakers maximize complexity, favoring complex forms to support information retrieval and reduce entropy. This leads to conflicting predictions for word order: PGCH suggests that languages with shorter nouns than verbs will exhibit SV order, while surprisal theory predicts the opposite.

Main Hypotheses

The paper presents two competing hypotheses:

  • H1: SV languages have shorter "S" than "V", while VS languages have shorter "V" than "S," aligning with the minimization of complexity.
  • H2: SV languages have longer "S" than "V", while VS languages have longer "V" than "S," aligning with the maximization of complexity.

Investigating Word Order and Length

Using the tagged PBC, basic word order is identified via the N1 ratio, and the average length of nouns/arguments and verbs/predicates is computed for each language. Statistical analysis reveals that both SV and VS languages exhibit a significant effect of word length, with arguments generally shorter than predicates crosslinguistically. However, when frequency effects are considered, nouns are longer than verbs in SV languages but shorter than verbs in VS languages.

Testing the Predictive Validity of Word Length

The study assesses the predictive power of noun and verb lengths in determining word order. By analyzing corpora from historical and modern language pairs (Ancient Hebrew/Modern Hebrew, Classical Arabic/Egyptian Arabic), the paper demonstrates that the relative lengths of nouns and verbs can accurately classify word order changes over time.

Testing Variance Predicted by Word Length

Hierarchical linear regression models are used to evaluate the relative contributions of language area, family membership, and noun/verb lengths in predicting word order. The analysis reveals that noun and verb lengths account for more variance than family membership, suggesting that word length is a stronger predictor of word order than descent.

Discussion

The paper discusses how the shorter length of arguments (influenced by pronouns) supports the hypothesis that lighter words occur earlier in sentences. However, the finding that nouns are longer than verbs in SV languages and shorter in VS languages (when frequency is considered) supports the opposing hypothesis that heavier constituents are placed first. These results suggest a balance between production efficiency (favoring shorter words) and communicative efficiency (favoring informative content).

A "Min-Max" Theory of Processing

The study proposes a "Min-Max" theory of processing, suggesting that speakers minimize processing effort while maximizing information content. This theory refines Hawkins' PGCH and posits that speakers balance these competing pressures to optimize their utterances. Frequent use of short pronominal forms encourages a shift toward argument-initial order, while longer nouns provide more information and move toward an initial position. Languages with longer verbs tend to maintain predicate-initial order. The Min-Max theory is supported by clinical psychology research, language modeling, and information theory. The paper suggests that the Min-Max principle applies at multiple hierarchical levels, influencing both word selection and structural choices.

Limitations

The limitations of the study include the use of automated POS tagging, which may introduce inaccuracies. The dataset, while large, represents only a quarter of the world's languages, limiting the generalizability of the findings. Additionally, the binary classification of word order (SV/VS) simplifies the complexities of word order patterns. The study concludes by advocating for continued investigation of the interplay between processing efficiency and information structure in language evolution.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 121 likes about this paper.