Measuring Meaning Composition in the Human Brain with Composition Scores from Large Language Models (2403.04325v3)
Abstract: The process of meaning composition, wherein smaller units like morphemes or words combine to form the meaning of phrases and sentences, is essential for human sentence comprehension. Despite extensive neurolinguistic research into the brain regions involved in meaning composition, a computational metric to quantify the extent of composition is still lacking. Drawing on the key-value memory interpretation of transformer feed-forward network blocks, we introduce the Composition Score, a novel model-based metric designed to quantify the degree of meaning composition during sentence comprehension. Experimental findings show that this metric correlates with brain clusters associated with word frequency, structural processing, and general sensitivity to words, suggesting the multifaceted nature of meaning composition during human sentence comprehension.
- Palm 2 technical report.
- Scaling laws for language encoding models in fMRI. In Thirty-seventh Conference on Neural Information Processing Systems.
- Douglas K Bemis and Liina Pylkkänen. 2011. Simple composition: A magnetoencephalography investigation into the comprehension of minimal linguistic phrases. Journal of Neuroscience, 31(8):2801–2814.
- Douglas K Bemis and Liina Pylkkänen. 2013. Flexible composition: Meg evidence for the deployment of basic combinatorial linguistic mechanisms in response to task demands. PloS one, 8(9):e73949.
- Language switching decomposed through meg and evidence from bimodal bilinguals. Proceedings of the National Academy of Sciences, 115(39):9708–9713.
- Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and language, 157:81–94.
- Qing Cai and Marc Brysbaert. 2010. Subtlex-ch: Chinese word and character frequencies based on film subtitles. PloS one, 5(6):e10729.
- Deep language algorithms predict semantic comprehension from brain activity. Scientific reports, 12(1):16327.
- Charlotte Caucheteux and Jean-Rémi King. 2022. Brains and algorithms partially converge in natural language processing. Communications biology, 5(1):134.
- What does BERT look at? an analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 276–286, Florence, Italy. Association for Computational Linguistics.
- A magnetoencephalographic component whose latency reflects lexical frequency. Cognitive brain research, 10(3):345–348.
- Bruce Fischl. 2012. Freesurfer. Neuroimage, 62(2):774–781.
- Graham Flick and Liina Pylkkänen. 2020. Isolating syntax in natural language: Meg evidence for an early contribution of left posterior temporal cortex. Cortex, 127:42–57.
- L. Frazier. 1985. Syntactic complexity. In D. Dowty, L. Karttunen, and A. Zwicky, editors, Natural language parsing: Psychological, computational, and theoretical perspectives, pages 129–189. Cambridge, UK: Cambridge University Press.
- Roles of scaling and instruction tuning in language perception: Model vs. human attention. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13042–13055, Singapore. Association for Computational Linguistics.
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 30–45, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Shared computational principles for language processing in humans and deep language models. Nature neuroscience, 25(3):369–380.
- Peter Hagoort. 2005. On broca, brain, and binding: a new framework. Trends in cognitive sciences, 9(9):416–423.
- John T. Hale. 2014. Automaton Theories of Human Sentence Comprehension. CSLI Publications, Center for the Study of Language and Information, Stanford, California.
- Intersubject synchronization of cortical activity during natural vision. science, 303(5664):1634–1640.
- John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota. Association for Computational Linguistics.
- Le petit prince multilingual naturalistic fmri corpus. Scientific data, 9(1):530.
- Semantic composition in experimental and naturalistic paradigms. Imaging Neuroscience, 2:1–17.
- Jixing Li and Liina Pylkkänen. 2021. Disentangling semantic composition and semantic association in the left temporal lobe. Journal of Neuroscience, 41(30):6526–6538.
- Neural dynamics of semantic composition. Proceedings of the National Academy of Sciences, 116(42):21318–21327.
- Eric Maris and Robert Oostenveld. 2007. Nonparametric statistical testing of eeg-and meg-data. Journal of neuroscience methods, 164(1):177–190.
- William Matchin and Gregory Hickok. 2020. The cortical organization of syntax. Cerebral Cortex, 30(3):1481–1498.
- Same words, different structures: An fmri investigation of argument relations and the angular gyrus. Neuropsychologia, 125:116–128.
- Neurophysiological dynamics of phrase-structure building during sentence processing. Proceedings of the National Academy of Sciences, 114(18):E3669–E3678.
- Gpt-4 technical report.
- Alicia Parrish and Liina Pylkkänen. 2022. Conceptual combination in the latl with and without syntactic composition. Neurobiology of Language, 3(1):46–66.
- Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
- Converging evidence for the neuroanatomic basis of combinatorial semantics in the angular gyrus. Journal of Neuroscience, 35(7):3276–3284.
- Liina Pylkkänen. 2019. The neural basis of combinatory syntax and semantics. Science, 366(6461):62–66.
- Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints.
- The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45):e2105646118.
- Disambiguating form and lexical frequency effects in meg responses using homonyms. Language and Cognitive Processes, 27(2):275–287.
- End-to-end memory networks. Advances in neural information processing systems, 28.
- Combining computational controls with natural text reveals aspects of meaning composition. Nature computational science, 2(11):745–757.
- Llama 2: Open foundation and fine-tuned chat models.
- Neurons in large language models: Dead, n-gram, positional.
- The latl as locus of composition: Meg evidence from english and arabic. Brain and Language, 141:124–134.
- Victor H Yngve. 1960. A model and an hypothesis for language structure. Proceedings of the American philosophical society, 104(5):444–466.
- Linmin Zhang and Liina Pylkkänen. 2015. The interplay of composition and concept specificity in the left anterior temporal lobe: An meg study. NeuroImage, 111:228–240.