2000 character limit reached
Language models align with human judgments on key grammatical constructions (2402.01676v2)
Published 19 Jan 2024 in cs.CL and cs.AI
Abstract: Do LLMs make human-like linguistic generalizations? Dentella et al. (2023) ("DGL") prompt several LLMs ("Is the following sentence grammatically correct in English?") to elicit grammaticality judgments of 80 English sentences, concluding that LLMs demonstrate a "yes-response bias" and a "failure to distinguish grammatical from ungrammatical sentences". We re-evaluate LLM performance using well-established practices and find that DGL's data in fact provide evidence for just how well LLMs capture human behaviors. Models not only achieve high accuracy overall, but also capture fine-grained variation in human linguistic judgments.
- D. Birdsong. Metalinguistic Performance and Interlinguistic Competence. Springer Series in Language and Communication. Springer Berlin Heidelberg, 1989. ISBN 978-3-642-74124-1.
- Predicting the dative alternation. In Cognitive Foundations of Interpretation, pages 69–94. KNAW, 2007.
- N. Chomsky. Knowledge of Language: Its Nature, Origin, and Use. Praeger Scientific, 1986.
- A. Clark and S. Lappin. Linguistic Nativism and the Poverty of the Stimulus. John Wiley & Sons, 2010.
- Systematic testing of three Language Models reveals low language accuracy, absence of response stability, and a yes-response bias. Proceedings of the National Academy of Sciences, 120(51):e2309583120, Dec. 2023. doi: 10.1073/pnas.2309583120. URL https://doi.org/10.1073/pnas.2309583120. Publisher: Proceedings of the National Academy of Sciences.
- Y. Han. Grammaticality Judgment Tests: How Reliable and Valid Are They? In L. Woytak, editor, Applied Language Learning, volume 11, pages 177–204. 2000.
- J. Hu and R. Levy. Prompting is not a substitute for probability measurements in large language models. In H. Bouamor, J. Pino, and K. Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5040–5060, Singapore, Dec. 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.306. URL https://aclanthology.org/2023.emnlp-main.306.
- A Systematic Assessment of Syntactic Generalization in Neural Language Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1725–1744, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.158. URL https://aclanthology.org/2020.acl-main.158.
- R. Marvin and T. Linzen. Targeted Syntactic Evaluation of Language Models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1192–1202, Brussels, Belgium, Oct. 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1151. URL https://aclanthology.org/D18-1151.
- BLiMP: The Benchmark of Linguistic Minimal Pairs for English. Transactions of the Association for Computational Linguistics, 8, 2020. URL https://doi.org/10.1162/tacl_a_00321.
- Jennifer Hu (22 papers)
- Kyle Mahowald (40 papers)
- Gary Lupyan (2 papers)
- Anna Ivanova (8 papers)
- Roger Levy (43 papers)