Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 200 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 44 tok/s Pro
GPT-5 High 42 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 427 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

From Boltzmann to Zipf through Shannon and Jaynes (1912.03570v1)

Published 7 Dec 2019 in physics.soc-ph and cond-mat.stat-mech

Abstract: The word-frequency distribution provides the fundamental building blocks that generate discourse in language. It is well known, from empirical evidence, that the word-frequency distribution of almost any text is described by Zipf's law, at least approximately. Following Stephens and Bialek [Phys. Rev. E 81, 066119, 2010], we interpret the frequency of any word as arising from the interaction potential between its constituent letters. Indeed, Jaynes' maximum-entropy principle, with the constrains given by every empirical two-letter marginal distribution, leads to a Boltzmann distribution for word probabilities, with an energy-like function given by the sum of all pairwise (two-letter) potentials. The improved iterative-scaling algorithm allows us finding the potentials from the empirical two-letter marginals. Appling this formalism to words with up to six letters from the English subset of the recently created Standardized Project Gutenberg Corpus, we find that the model is able to reproduce Zipf's law, but with some limitations: the general Zipf's power-law regime is obtained, but the probability of individual words shows considerable scattering. In this way, a pure statistical-physics framework is used to describe the probabilities of words. As a by-product, we find that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.