Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention
Abstract: When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinterpretations using four LLMs: GPT-2, LLaMA-2, Flan-T5, and RoBERTa. The overall goal is to evaluate whether humans and LLMs are aligned in their processing of garden-path sentences and in the lingering misinterpretations past the point of disambiguation, especially when extra-syntactic information (e.g., a comma delimiting a clause boundary) is present to guide processing. We address this goal using 24 garden-path sentences that have optional transitive and reflexive verbs leading to temporary ambiguities. For each sentence, there are a pair of comprehension questions corresponding to the misinterpretation and the correct interpretation. In three experiments, we (1) measure the dynamic semantic interpretations of LLMs using the question-answering task; (2) track whether these models shift their implicit parse tree at the point of disambiguation (or by the end of the sentence); and (3) visualize the model components that attend to disambiguating information when processing the question probes. These experiments show promising alignment between humans and LLMs in the processing of garden-path sentences, especially when extra-syntactic information is available to guide processing.
- AI@Meta. (2024). Llama 3 model card.
- (2024). Pre-training llms using human-like development data corpus.
- (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, pp. 1877–1901). Curran Associates, Inc.
- (2001). Thematic roles assigned along the garden path linger. Cognitive psychology, 42(4), 368–407.
- (2022). Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- (2019). What does BERT look at? an analysis of BERT’s attention. arXiv preprint arXiv:1906.04341.
- (2019). Bert: Pre-training of deep bidirectional transformers for language understanding.
- Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model. In Second meeting of the north American chapter of the association for computational linguistics.
- Ivanova, A. A. (2023). Running cognitive evaluations on large language models: The do’s and the don’ts.
- (2024). Mixtral of experts.
- (2022). Garden path traversal in GPT-2. In Proceedings of the fifth blackboxnlp workshop on analyzing and interpreting neural networks for nlp (pp. 305–313). Association for Computational Linguistics. doi: 10.18653/v1/2022.blackboxnlp-1.25
- (2024). Mission: Impossible language models. arXiv preprint arXiv:2401.06416.
- Koubaa, A. (2023). GPT-4 vs. GPT-3.5: A concise showdown.
- Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3), 1126-1177. doi: https://doi.org/10.1016/j.cognition.2007.05.006
- (2023). We’re afraid language models aren’t modeling ambiguity. In Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 790–807). Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.51
- (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
- (1994). The lexical nature of syntactic ambiguity resolution. Psychological review, 101(4), 676.
- (2020). Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48), 30046–30054.
- (2018). Targeted syntactic evaluation of language models. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 1192–1202). Association for Computational Linguistics. doi: 10.18653/v1/D18-1151
- OpenAI. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774.
- (2009). Lingering misinterpretations in garden-path sentences: evidence from a paraphrasing task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(1), 280.
- (2019). Language models are unsupervised multitask learners.
- (2023). Numeric magnitude comparison effects in large language models. In The 61st annual meeting of the association for computational linguistics.
- (2013). Lingering misinterpretations of garden path sentences arise from competing syntactic representations. Journal of Memory and Language, 69(2), 104–120.
- (2022). Learning to summarize from human feedback.
- (2023). Llama: Open and efficient foundation language models.
- (2017). Attention is all you need. Advances in neural information processing systems, 30.
- (2024). Evaluating typicality in combined language and vision model concept representations. In Under review.
- Vig, J. (2019). A multiscale visualization of attention in the transformer model. arXiv preprint arXiv:1906.05714.
- (2022). Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- (2023). Using Computational Models to Test Syntactic Learnability. Linguistic Inquiry, 1-44. doi: 10.1162/ling–“˙˝a–“˙˝00491
- (2020b). On the predictive power of neural language models for human real-time comprehension behavior. arXiv. doi: 10.48550/arXiv.2006.01912
- (2020a). On the predictive power of neural language models for human real-time comprehension behavior. In Proceedings of the 42nd annual meeting of the cognitive science society (p. 1707–1713).
- (2019). What syntactic structures block dependencies in rnn language models? arXiv. doi: 10.48550/arXiv.1905.10431
- (2021). A targeted assessment of incremental processing in neural language models and humans. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: Long papers) (pp. 939–952). Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.76
- (2023). A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. arXiv preprint arXiv:2303.10420.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.