Revisiting Entropy Rate Constancy in Text (2305.12084v2)

Published 20 May 2023 in cs.CL

Abstract: The uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. Early evidence in support of the UID hypothesis came from Genzel & Charniak (2002), which proposed an entropy rate constancy principle based on the probability of English text under n-gram LLMs. We re-evaluate the claims of Genzel & Charniak (2002) with neural LLMs, failing to find clear evidence in support of entropy rate constancy. We conduct a range of experiments across datasets, model sizes, and languages and discuss implications for the uniform information density hypothesis and linguistic theories of efficient communication more broadly.

References (50)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/2200220772/status/1733258945433886999

Revisiting Entropy Rate Constancy in Text (2305.12084v2)

Summary

Related Papers

Tweets