Papers
Topics
Authors
Recent
2000 character limit reached

The Morphemic Origin of Zipf's Law: A Factorized Combinatorial Framework (2512.12394v1)

Published 13 Dec 2025 in stat.ME, cs.CL, physics.soc-ph, and stat.AP

Abstract: We present a simple structure based model of how words are formed from morphemes. The model explains two major empirical facts: the typical distribution of word lengths and the appearance of Zipf like rank frequency curves. In contrast to classical explanations based on random text or communication efficiency, our approach uses only the combinatorial organization of prefixes, roots, suffixes and inflections. In this Morphemic Combinatorial Word Model, a word is created by activating several positional slots. Each slot turns on with a certain probability and selects one morpheme from its inventory. Morphemes are treated as stable building blocks that regularly appear in word formation and have characteristic positions. This mechanism produces realistic word length patterns with a concentrated middle zone and a thin long tail, closely matching real languages. Simulations with synthetic morpheme inventories also generate rank frequency curves with Zipf like exponents around 1.1-1.4, similar to English, Russian and Romance languages. The key result is that Zipf like behavior can emerge without meaning, communication pressure or optimization principles. The internal structure of morphology alone, combined with probabilistic activation of slots, is sufficient to create the robust statistical patterns observed across languages.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.