- The paper introduces a Min-Max theory showing that word length significantly predicts word order change across over 1,500 languages.
- It uses hierarchical regression models to reveal that noun and verb lengths explain more variance in word order than language family or area.
- Findings suggest a balance between processing efficiency and communicative complexity with implications for historical language evolution.
Word Length as a Predictor of Word Order
This paper introduces a "Min-Max" theory of language evolution, positing that word order change is driven by the competing pressures of processing efficiency and information structure. The study leverages a large, tagged parallel dataset of over 1,500 languages to investigate the correlation between word class length and word order, revealing that word length is significantly correlated with word order crosslinguistically. The paper claims to predict historical word order change in two different phylogenetic lines and explains more variance than descent or language area in regression models.
Complexity, Processing, and Word Order
The paper acknowledges the multifaceted nature of linguistic complexity, noting the challenge of disentangling theoretical considerations from language processing. It discusses how word length serves as a measure of word complexity, impacting word order due to its influence on processing. The paper addresses the debate between theories emphasizing minimization of complexity for efficient processing and those suggesting maximization of complexity for enhanced information content. The performance-grammar correspondence hypothesis (PGCH) posits that speakers minimize complexity to expedite processing, aligning with observations on constituent weight and dependency length minimization. Conversely, surprisal theory proposes that speakers maximize complexity, favoring complex forms to support information retrieval and reduce entropy. This leads to conflicting predictions for word order: PGCH suggests that languages with shorter nouns than verbs will exhibit SV order, while surprisal theory predicts the opposite.
Main Hypotheses
The paper presents two competing hypotheses:
- H1: SV languages have shorter "S" than "V", while VS languages have shorter "V" than "S," aligning with the minimization of complexity.
- H2: SV languages have longer "S" than "V", while VS languages have longer "V" than "S," aligning with the maximization of complexity.
Investigating Word Order and Length
Using the tagged PBC, basic word order is identified via the N1 ratio, and the average length of nouns/arguments and verbs/predicates is computed for each language. Statistical analysis reveals that both SV and VS languages exhibit a significant effect of word length, with arguments generally shorter than predicates crosslinguistically. However, when frequency effects are considered, nouns are longer than verbs in SV languages but shorter than verbs in VS languages.
Testing the Predictive Validity of Word Length
The study assesses the predictive power of noun and verb lengths in determining word order. By analyzing corpora from historical and modern language pairs (Ancient Hebrew/Modern Hebrew, Classical Arabic/Egyptian Arabic), the paper demonstrates that the relative lengths of nouns and verbs can accurately classify word order changes over time.
Testing Variance Predicted by Word Length
Hierarchical linear regression models are used to evaluate the relative contributions of language area, family membership, and noun/verb lengths in predicting word order. The analysis reveals that noun and verb lengths account for more variance than family membership, suggesting that word length is a stronger predictor of word order than descent.
Discussion
The paper discusses how the shorter length of arguments (influenced by pronouns) supports the hypothesis that lighter words occur earlier in sentences. However, the finding that nouns are longer than verbs in SV languages and shorter in VS languages (when frequency is considered) supports the opposing hypothesis that heavier constituents are placed first. These results suggest a balance between production efficiency (favoring shorter words) and communicative efficiency (favoring informative content).
A "Min-Max" Theory of Processing
The study proposes a "Min-Max" theory of processing, suggesting that speakers minimize processing effort while maximizing information content. This theory refines Hawkins' PGCH and posits that speakers balance these competing pressures to optimize their utterances. Frequent use of short pronominal forms encourages a shift toward argument-initial order, while longer nouns provide more information and move toward an initial position. Languages with longer verbs tend to maintain predicate-initial order. The Min-Max theory is supported by clinical psychology research, language modeling, and information theory. The paper suggests that the Min-Max principle applies at multiple hierarchical levels, influencing both word selection and structural choices.
Limitations
The limitations of the study include the use of automated POS tagging, which may introduce inaccuracies. The dataset, while large, represents only a quarter of the world's languages, limiting the generalizability of the findings. Additionally, the binary classification of word order (SV/VS) simplifies the complexities of word order patterns. The study concludes by advocating for continued investigation of the interplay between processing efficiency and information structure in language evolution.