Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distinct word length frequencies: distributions and symbol entropies (1207.2334v2)

Published 10 Jul 2012 in cs.CL and physics.data-an

Abstract: The distribution of frequency counts of distinct words by length in a language's vocabulary will be analyzed using two methods. The first, will look at the empirical distributions of several languages and derive a distribution that reasonably explains the number of distinct words as a function of length. We will be able to derive the frequency count, mean word length, and variance of word length based on the marginal probability of letters and spaces. The second, based on information theory, will demonstrate that the conditional entropies can also be used to estimate the frequency of distinct words of a given length in a language. In addition, it will be shown how these techniques can also be applied to estimate higher order entropies using vocabulary word length.

Citations (38)

Summary

We haven't generated a summary for this paper yet.