Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting Entropy Rate Constancy in Text (2305.12084v2)

Published 20 May 2023 in cs.CL

Abstract: The uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. Early evidence in support of the UID hypothesis came from Genzel & Charniak (2002), which proposed an entropy rate constancy principle based on the probability of English text under n-gram LLMs. We re-evaluate the claims of Genzel & Charniak (2002) with neural LLMs, failing to find clear evidence in support of entropy rate constancy. We conduct a range of experiments across datasets, model sizes, and languages and discuss implications for the uniform information density hypothesis and linguistic theories of efficient communication more broadly.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. AraGPT2: Pre-trained transformer for Arabic language generation. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 196–207, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
  2. Fatemeh Torabi Asr and Vera Demberg. 2015. Uniform surprisal at the level of discourse relations: Negation markers and discourse connective omission. In Proceedings of the 11th international conference on computational semantics, pages 118–128.
  3. Matthew Aylett and Alice Turk. 2004. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1):31–56. PMID: 15298329.
  4. MP Aylett. 1999. Stochastic suprasegmentals: Relationships between redundancy, prosodic structure and syllabic duration. Proceedings of ICPhS–99, San Francisco.
  5. Effects of disfluencies, predictability, and utterance position on word form variation in english conversation. The Journal of the acoustical society of America, 113(2):1001–1024.
  6. Modeling the noun phrase versus sentence coordination ambiguity in Dutch: Evidence from surprisal theory. In Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics, pages 72–80, Uppsala, Sweden. Association for Computational Linguistics.
  7. Thomas M Cover and Joy A Thomas. 2012. Elements of information theory. John Wiley & Sons.
  8. Ibrahim Abu El-Khair. 2016. 1.5 billion words arabic corpus. arXiv preprint arXiv:1611.04033.
  9. Lossy-context surprisal: An information-theoretic model of memory effects in sentence processing. Cognitive science, 44(3):e12814.
  10. Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences, 112(33):10336–10341.
  11. Dmitriy Genzel and Eugene Charniak. 2002. Entropy rate constancy in text. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 199–206, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  12. Dmitriy Genzel and Eugene Charniak. 2003. Variation of entropy and parse trees of sentences as a function of the sentence number. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pages 65–72.
  13. Color naming across languages reflects color use. Proceedings of the National Academy of Sciences, 114(40):10785–10790.
  14. How efficiency shapes human language. Trends in cognitive sciences, 23(5):389–407.
  15. Mario Giulianelli and Raquel Fernández. 2021. Analysing human strategies of information transmission as a function of discourse context. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 647–660, Online. Association for Computational Linguistics.
  16. Is information density uniform in task-oriented dialogues? In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8271–8283, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  17. The PhotoBook dataset: Building common ground through visually-grounded dialogue. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1895–1910, Florence, Italy. Association for Computational Linguistics.
  18. An information-theoretic explanation of adjective ordering preferences. In CogSci.
  19. Universals of word order reflect optimization of grammars for efficient communication. Proceedings of the National Academy of Sciences, 117(5):2347–2353.
  20. John Hale. 2001. A probabilistic Earley parser as a psycholinguistic model. In Second Meeting of the North American Chapter of the Association for Computational Linguistics.
  21. news-please: A generic news crawler and extractor. In Proceedings of the 15th International Symposium of Information Science, pages 218–223.
  22. John A Hawkins. 2009. Language universals and the performance-grammar correspondence hypothesis. Language universals, pages 54–78.
  23. T Florian Jaeger. 2010. Redundancy and reduction: Speakers manage syntactic information density. Cognitive psychology, 61(1):23–62.
  24. T Florian Jaeger and Roger P Levy. 2007. Speakers optimize information density through syntactic reduction. In Advances in neural information processing systems, pages 849–856.
  25. Frank Keller. 2004. The entropy rate principle as a predictor of processing effort: An evaluation against eye-tracking data. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 317–324, Barcelona, Spain. Association for Computational Linguistics.
  26. Semantic typology and efficient communication. Annual Review of Linguistics, 4:109–128.
  27. Maurice George Kendall. 1948. Rank correlation methods.
  28. Sharp nearby, fuzzy far away: How neural language models use context. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 284–294, Melbourne, Australia. Association for Computational Linguistics.
  29. Mind the gap: Assessing temporal generalization in neural language models. In Advances in Neural Information Processing Systems, volume 34, pages 29348–29363. Curran Associates, Inc.
  30. Roger Levy. 2008. Expectation-based syntactic comprehension. Cognition, 106(3):1126–1177.
  31. Henry B Mann. 1945. Nonparametric tests against trend. Econometrica: Journal of the Econometric Society, pages 245–259.
  32. Building a large annotated corpus of english: The penn treebank.
  33. Introduction: Compiling and analysing the spoken british national corpus 2014. International Journal of Corpus Linguistics, 22(3):311–318.
  34. If beam search is the answer, what was the question? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2173–2185, Online. Association for Computational Linguistics.
  35. Revisiting the Uniform Information Density hypothesis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 963–980, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  36. Typical decoding for natural language generation. arXiv preprint arXiv:2202.00666.
  37. Pointer sentinel mixture models.
  38. Sabrina J. Mielke. 2019. Can you compare perplexity across different segmentations?
  39. Joe O’Connor and Jacob Andreas. 2021. What context features can transformer language models use? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 851–864, Online. Association for Computational Linguistics.
  40. Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences, 108(9):3526–3529.
  41. Language models are unsupervised multitask learners.
  42. Evan Sandhaus. 2008. The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia, 6(12):e26752.
  43. Large-scale evidence for logarithmic effects of word predictability on reading time.
  44. Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423.
  45. The HCRC map task corpus: Natural dialogue for speech recognition. In Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993.
  46. Attention is all you need.
  47. A cognitive regularizer for language modeling. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5191–5202, Online. Association for Computational Linguistics.
  48. Yang Xu and David Reitter. 2018. Information density converges in dialogue: Towards an information-theoretic model. Cognition, 170:147–163.
  49. Efficient compression in color naming and its evolution. Proceedings of the National Academy of Sciences, 115(31):7937–7942.
  50. George K. Zipf. 1949. Human behavior and the principle of least effort. Addison-Wesley Press.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com