Linguistic Structure from a Bottleneck on Sequential Information Processing (2405.12109v2)
Abstract: Human language is a unique form of communication in the natural world, distinguished by its structured nature. Most fundamentally, it is systematic, meaning that signals can be broken down into component parts that are individually meaningful -- roughly, words -- which are combined in a regular way to form sentences. Furthermore, the way in which these parts are combined maintains a kind of locality: words are usually concatenated together, and they form contiguous phrases, keeping related parts of sentences close to each other. We address the challenge of understanding how these basic properties of language arise from broader principles of efficient communication under information processing constraints. Here we show that natural-language-like systematicity arises in codes that are constrained by predictive information, a measure of the amount of information that must be extracted from the past of a sequence in order to predict its future. In simulations, we show that such codes approximately factorize their source distributions, and then express the resulting factors systematically and locally. Next, in a series of cross-linguistic corpus studies, we show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics. Our result suggests that human language performs a sequential, discrete form of Independent Components Analysis on the statistical distribution over meanings that need to be expressed. It establishes a link between the statistical and algebraic structure of human language, and reinforces the idea that the structure of human language is shaped by communication under cognitive constraints.
- Gottlob Frege. Gedankengefüge. Beiträge zur Philosophie des deutschen Idealismus, 3(1):36–51, 1923.
- Theo M. V. Janssen and Barbara H. Partee. Compositionality. In Handbook of Logic and Language, pages 417–473. Elsevier, 1997.
- Otto Behaghel. Deutsche Syntax: Eine geschichtliche Darstellung. Band IV: Wortstellung. Carl Winter, Heidelberg, 1932.
- Talmy Givón. Isomorphism in the grammatical code: Cognitive and biological considerations. Studies in Language, 15(1):85–114, 1991.
- John A. Hawkins. Efficiency and complexity in grammars. Oxford University Press, Oxford, 2004.
- The emergence of grammatical structure from inter-predictability. PsyArXiv, 2023.
- Predictability, complexity, and learning. Neural Computation, 13(11):2409–2463, 2001.
- Regularities unseen, randomness observed: Levels of entropy convergence. Chaos: An Interdisciplinary Journal of Nonlinear Science, 13(1):25–54, 2003.
- Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, 1994.
- Blind separation and blind deconvolution: An information-theoretic approach. In 1995 International Conference on Acoustics, Speech, and Signal Processing, volume 5, pages 3415–3418. IEEE, 1995.
- Benoit Mandelbrot. An informational theory of the statistical structure of language. Communication Theory, 84:486–502, 1953.
- Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences, 100(3):788, 2003.
- On language ‘utility’: Processing complexity and communicative efficiency. Wiley Interdisciplinary Reviews: Cognitive Science, 2(3):323–335, 2011.
- How efficiency shapes human language. Trends in Cognitive Sciences, 23(5):389–407, 2019.
- Information theory as a bridge between language function and language form. Frontiers in Communication, 7:657725, 2022.
- David A Huffman. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9):1098–1101, 1952.
- The Computational Brain. MIT Press, 1992.
- The Now-or-Never Bottleneck: A fundamental constraint on language. Behavioral and Brain Sciences, pages 1–19, 2016.
- Natalia Levshina. Communicative Efficiency. Cambridge University Press, 2022.
- Frieda Goldman-Eisler. Speech production and language statistics. Nature, 180(4600):1497–1497, 1957.
- Automatic and attentional processes in the effects of sentence contexts on word recognition. Journal of Verbal Learning and Verbal Behavior, 18(1):1–20, 1979.
- Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20(6):641–655, 1981.
- How incremental is language production? Evidence from the production of utterances requiring the computation of arithmetic sums. Journal of Memory and Language, 46(1):57–84, 2002.
- Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language, 60(1):92–111, 2009.
- The effect of word predictability on reading time is logarithmic. Cognition, 128(3):302–319, 2013.
- The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021.
- A hierarchy of linguistic predictions during natural language comprehension. Proceedings of the National Academy of Sciences, 119(32):e2201968119, 2022.
- Prediction during language comprehension: What is next? Trends in Cognitive Sciences, 27(11):1032–1052, 2023.
- Richard Futrell. Information-theoretic principles in incremental language production. Proceedings of the National Academy of Sciences, 120(39):e2220593120, 2023.
- Finitary models of language users. Handbook of Mathematical Psychology, 2:419–491, 1963.
- Efficient representation as a design principle for neural coding and computation. In 2006 IEEE International Symposium on Information Theory, pages 659–663. IEEE, 2006.
- Predictive information in a sensory population. Proceedings of the National Academy of Sciences, 112(22):6908–6913, 2015.
- Lossy-context surprisal: An information-theoretic model of memory effects in sentence processing. Cognitive Science, 44:e12814, 2020a.
- Modeling word and morpheme order in natural language as an efficient tradeoff of memory and surprisal. Psychological Review, 128(4):726–756, 2021.
- Richard Montague. Universal grammar. Theoria, 36(3):373–398, 1970.
- Simon Kirby. Syntax out of learning: The cultural evolution of structured communication in a population of induction algorithms. In European Conference on Artificial Life, pages 694–703. Springer, 1999.
- Complex systems in language evolution: The cultural emergence of compositional structure. Advances in Complex Systems, 6(4):537–558, 2003.
- Linguistic structure is an evolutionary trade-off between simplicity and expressivity. In 35th Annual Conference of the Cognitive Science Society, pages 1348–1353. Cognitive Science Society, 2013.
- Michael Franke. Creative compositionality from reinforcement learning in signaling games. In Evolution of Language: Proceedings of the 10th International Conference (EVOLANG10), pages 82–89. World Scientific, 2014.
- Compression and communication in the cultural evolution of linguistic structure. Cognition, 141:87–102, 2015.
- Wlodek Zadrozny. From compositional to systematic semantics. Linguistics and Philosophy, 17:329–342, 1994.
- Challenging common assumptions in the unsupervised learning of disentangled representations. In International Conference on Machine Learning, pages 4114–4124. PMLR, 2019.
- John Batali. Computational simulations of the emergence of grammar. In J. R. Hurford, M. Studdert-Kennedy, and C. Knight, editors, Approaches to the Evolution of Language: Social and Cognitive Bases. Cambridge University Press, Cambridge, 1998.
- Language origin from an emergentist perspective. Applied Linguistics, 27(4):691–716, 2006.
- Naming a structured world: A cultural route to duality of patterning. PLOS ONE, 7(6):1–8, 06 2012. doi: 10.1371/journal.pone.0037744. URL http://dx.doi.org/10.1371%2Fjournal.pone.0037744.
- Multi-agent cooperation and the emergence of (natural) language. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 2017.
- Emergence of grounded compositional language in multi-agent populations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 2018.
- Shane Steinert-Threlkeld. Toward the emergence of nontrivial compositionality. Philosophy of Science, 87(5):897–909, 2020.
- Catalytic role of noise and necessity of inductive biases in the emergence of compositional communication. Advances in Neural Information Processing Systems, 34:23075–23088, 2021.
- The evolution of syntactic communication. Nature, 404(6777):495–498, 2000.
- Jeffrey A. Barrett. Dynamic partitioning and the conventionality of kinds. Philosophy of Science, 74(4):527–546, 2007.
- Michael Franke. The evolution of compositionality in signaling games. Journal of Logic, Language and Information, 25(3-4):355–377, 2016.
- On the evolution of compositional language. Philosophy of Science, 87(5):910–920, 2020.
- Measuring morphological fusion using partial information decomposition. In Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na, editors, Proceedings of the 29th International Conference on Computational Linguistics, pages 44–54, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics. URL https://aclanthology.org/2022.coling-1.5.
- Claude E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:623–656, 1948.
- Alison Wray. Protolanguage as a holistic system for social interaction. Language & Communication, 18(1):47–67, 1998.
- Ferdinand de Saussure. Cours de linguistique générale. Payot, Lausanne & Paris, 1916.
- John J. McCarthy. A prosodic theory of nonconcatenative morphology. Linguistic Inquiry, 12(3):373–418, 1981.
- Noam Chomsky. Syntactic Structures. Walter de Gruyter, 1957.
- The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research Methods, 51:1258–1270, 2019.
- Hungarian dependency treebank. In Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta, 2010.
- Universal Dependencies 1.0. Universal Dependencies Consortium, 2015. URL http://hdl.handle.net/11234/1-1464.
- An information-theoretic characterization of morphological fusion. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10115–10120, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.793. URL https://aclanthology.org/2021.emnlp-main.793.
- n𝑛nitalic_n-gram counts and language models from the common crawl. In Language Resources and Evaluation, volume 2, page 4, 2014.
- Matthew S Dryer. On the order of demonstrative, numeral, adjective, and noun. Language, 94(4):798–833, 2018.
- Richard Futrell. Information-theoretic locality properties of natural language. In Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019), pages 2–15, Paris, France, 26 August 2019. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W19-7902.
- From the world to word order: Deriving biases in noun phrase order from statistical properties of the world. Language, 96(3):696–717, 2020.
- Claude E. Shannon. Prediction and entropy of printed English. Bell System Technical Journal, 30(1):50–64, 1951.
- Kinship categories across languages reflect general communicative principles. Science, 336(6084):1049–1054, 2012.
- Efficient compression in color naming and its evolution. Proceedings of the National Academy of Sciences, 115(31):7937–7942, 2018. ISSN 0027-8424. doi: 10.1073/pnas.1800521115. URL http://www.pnas.org/content/115/31/7937.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Priorless recurrent networks learn curiously. In Donia Scott, Nuria Bel, and Chengqing Zong, editors, Proceedings of the 28th International Conference on Computational Linguistics, pages 5147–5158, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.451. URL https://aclanthology.org/2020.coling-main.451.
- Noam Chomsky: The false promise of ChatGPT. The New York Times, March 8 2023. URL https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html.
- Large languages, impossible languages and human brains. Cortex, 167:82–85, 2023. ISSN 0010-9452. doi: https://doi.org/10.1016/j.cortex.2023.07.003. URL https://www.sciencedirect.com/science/article/pii/S0010945223001752.
- Learning syntax without planting trees: Understanding when and why Transformers generalize hierarchically. arXiv preprint arXiv:2404.16367, 2024.
- Mission: Impossible language models. arXiv preprint arXiv:2401.06416, 2024.
- Face recognition by independent component analysis. IEEE Transactions on Neural Networks, 13(6):1450–1464, 2002.
- Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013.
- Ralph Linsker. An application of the principle of maximum information preservation to linear systems. In Advances in Neural Information Processing Systems, pages 186–194, 1988.
- Horace B. Barlow. Possible principles underlying the transformation of sensory messages. In W. A. Rosenblith, editor, Sensory Communication, pages 217–233. MIT Press, Cambridge, MA, 1961.
- Horace B. Barlow. Unsupervised learning. Neural Computation, 1(3):295–311, 1989.
- Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195(1):215–243, 1968.
- The “independent components” of natural scenes are edge filters. Vision Research, 37(23):3327–3338, 1997.
- Explaining patterns of fusion in morphological paradigms using the memory–surprisal tradeoff. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 44, 2022.
- Wolfgang Hilberg. Der bekannte Grenzwert der redundanzfreien Information in Texten—eine Fehlinterpretation der Shannonschen Experimente? Frequenz, 44(9–10):243–248, 1990.
- Łukasz Dębowski. The relaxed Hilberg conjecture: A review and new experimental support. Journal of Quantitative Linguistics, 22(4):311–337, 2015.
- Łukasz Dębowski. Excess entropy in natural language: Present state and perspectives. Chaos: An Interdisciplinary Journal of Nonlinear Science, 21(3):037105, 2011.
- Ray Jackendoff. Linguistics in cognitive science: The state of the art. The Linguistic Review, 24:347–402, 2007.
- Adele E Goldberg. Constructions work. Walter de Gruyter GmbH & Co. KG, 2009.
- A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints. In Proceedings of the 10th International Conference on Cognitive Modeling, pages 7–12. Citeseer, 2010.
- Memory limitations are hidden in grammar. Glottometrics, 52:39–64, 2022.
- Uniquely human intelligence arose from expanded information capacity. Nature Reviews Psychology, pages 1–19, 2024.
- Joan L Bybee. Morphology: A study of the relation between meaning and form. John Benjamins, Amsterdam, 1985.
- Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Reviews, 21:171–193, 2017.
- Minimizing syntactic dependency lengths: Typological/cognitive universal? Annual Review of Linguistics, 4:1–15, 2018.
- Dependency locality as an explanatory principle for word order. Language, 96(2):371–413, 2020b.
- Tier-based strictly local constraints for phonology. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 58–64, 2011.
- Thomas Graf. Diving deeper into subregular syntax. Theoretical Linguistics, 48(3-4):245–278, 2022.
- John Mansfield. The word as a unit of internal predictability. Linguistics, 59(6):1427–1472, 2021.
- Syntactic dependencies correspond to word pairs with high mutual information. In Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019), pages 3–13, Paris, France, 27–28 August 2019. Association for Computational Linguistics.
- An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4):359–393, 1999.
- Peter Graff. Communicative efficiency in the lexicon. PhD thesis, Massachusetts Institute of Technology, 2012.
- Universal Dependency annotation for multilingual parsing. In Hinrich Schuetze, Pascale Fung, and Massimo Poesio, editors, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 92–97, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://aclanthology.org/P13-2017.
- Amir Zeldes. The GUM Corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3):581–612, 2017. doi: http://dx.doi.org/10.1007/s10579-016-9343-x.
- A cross-genre ensemble approach to robust Reddit part of speech tagging. In Proceedings of the 12th Web as Corpus Workshop (WAC-XII), 2020.
- A gold standard dependency corpus for English. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, may 2014. European Language Resources Association (ELRA). ISBN 978-2-9517408-8-4.