Black Big Boxes: Do Language Models Hide a Theory of Adjective Order? (2407.02136v1)
Abstract: In English and other languages, multiple adjectives in a complex noun phrase show intricate ordering patterns that have been a target of much linguistic theory. These patterns offer an opportunity to assess the ability of LMs to learn subtle rules of language involving factors that cross the traditional divisions of syntax, semantics, and pragmatics. We review existing hypotheses designed to explain Adjective Order Preferences (AOPs) in humans and develop a setup to study AOPs in LMs: we present a reusable corpus of adjective pairs and define AOP measures for LMs. With these tools, we study a series of LMs across intermediate checkpoints during training. We find that all models' predictions are much closer to human AOPs than predictions generated by factors identified in theoretical linguistics. At the same time, we demonstrate that the observed AOPs in LMs are strongly correlated with the frequency of the adjective pairs in the training data and report limited generalization to unseen combinations. This highlights the difficulty in establishing the link between LM performance and linguistic theory. We therefore conclude with a road map for future studies our results set the stage for, and a discussion of key questions about the nature of knowledge in LMs and their ability to generalize beyond the training sets.
- Word order does matter and shuffled language models know it. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6907–6919, Dublin, Ireland. Association for Computational Linguistics.
- Linguistic productivity: the case of determiners in English. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 330–343, Nusa Dua, Bali. Association for Computational Linguistics.
- Inbal Arnon and Neal Snider. 2010. More than words: Frequency effects for multi-word phrases. Journal of Memory and Language, 62(1):67–82.
- Marco Baroni. 2022. On the proper role of linguistically-oriented deep net analysis in linguistic theorizing. In Algebraic systems and the representation of linguistic knowledge, pages 5–22. CRC Press.
- Otto Behaghel. 1930. Von deutscher Wortstellung. Zeitschrift für Deutschkunde, 44:81–89.
- Emergent and predictable memorization in large language models. In Advances in Neural Information Processing Systems, volume 36, pages 28072–28090. Curran Associates, Inc.
- Pythia: A suite for analyzing large language models across training and scaling.
- Kathryn Bock. 1982. Toward a cognitive psychology of syntax: Information processing contributions to sentence formulation. Psychological Review, 89:1–47.
- Joan Bybee. 2010. Language, usage and cognition. Cambridge University Press.
- Lisa Bylinina and Alexey Tikhonov. 2022. Transformers in the loop: Polarity in neural models of language. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6601–6610, Dublin, Ireland. Association for Computational Linguistics.
- Brian Byrne. 1979. Rules of prenominal adjective order and the interpretation of “incompatible” adjective pairs. Journal of Verbal Learning and Verbal Behavior, 18(1):73–78.
- Sudden drops in the loss: Syntax acquisition, phase transitions, and simplicity bias in mlms. CoRR, abs/2309.07311.
- Guglielmo Cinque. 1996. On the evidence for partial N-movement in the Romance DP, Cambridge Studies in Linguistics. Cambridge University Press.
- Generalising to German plural noun classes, from the perspective of a recurrent neural network. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 94–108, Online. Association for Computational Linguistics.
- Holger Diessel. 2019. The grammar network: How linguistic structure is shaped by language use. Cambridge University Press.
- Robert M. W. Dixon. 1982. ‘Where Have All the Adjectives Gone?’ and other Essays in Semantics and Syntax. Mouton de Gruyter, Berlin.
- Predicting cross-linguistic adjective order with information gain. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 957–967, Online. Association for Computational Linguistics.
- Evaluating a century of progress on the cognitive science of adjective ordering. Transactions of the Association for Computational Linguistics, 11:1185–1200.
- What’s in my big data? CoRR, abs/2310.20707.
- Victor S. Ferreira and Gary S. Dell. 2000. Effect of ambiguity and lexical availability on syntactic and lexical production. Cognitive Psychology, 40(4):296–340.
- Subjectivity-based adjective ordering maximizes communicative success. In CogSci, pages 344–350.
- What determines the order of adjectives in English? comparing efficiency-based theories using dependency treebanks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2003–2012, Online. Association for Computational Linguistics.
- Richard Futrell and Roger P. Levy. 2019. Do RNNs learn human-like abstract word order preferences? In Proceedings of the Society for Computation in Linguistics (SCiL) 2019, pages 50–59.
- Dependency locality as an explanatory principle for word order. Language, 96:371 – 412.
- The pile: An 800gb dataset of diverse text for language modeling.
- SyntaxGym: An online platform for targeted evaluation of language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 70–76, Online. Association for Computational Linguistics.
- Edward Gibson. 1998. Linguistic complexity: locality of syntactic dependencies. Cognition, 68(1):1–76.
- A.E. Goldberg. 2006. Constructions at Work: The Nature of Generalization in Language. Oxford linguistics. Oxford University Press.
- Colorless green recurrent networks dream hierarchically. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1195–1205, New Orleans, Louisiana. Association for Computational Linguistics.
- An information-theoretic explanation of adjective ordering preferences. Cognitive Science.
- Felix Hill. 2012. Beauty before age? applying subjectivity to automatic English adjective ordering. In Proceedings of the NAACL HLT 2012 Student Research Workshop, pages 11–16, Montréal, Canada. Association for Computational Linguistics.
- A systematic assessment of syntactic generalization in neural language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1725–1744.
- BabyBERTa: Learning more grammar with small-scale child-directed language. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 624–646, Online. Association for Computational Linguistics.
- A taxonomy and review of generalization research in nlp. Nature Machine Intelligence, 5:1161–1174.
- Language models use monotonicity to assess NPI licensing. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4958–4969, Online. Association for Computational Linguistics.
- Jaap Jumelet and Dieuwke Hupkes. 2018. Do language models understand anything? on the ability of LSTMs to understand negative polarity items. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 222–231, Brussels, Belgium. Association for Computational Linguistics.
- Language models as an alternative evaluator of word order hypotheses: A case study in Japanese. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 488–504, Online. Association for Computational Linguistics.
- Mechanisms for handling nested dependencies in neural-network language models and humans. Cognition, 213:104699. Special Issue in Honour of Jacques Mehler, Cognition’s founding editor.
- Determinants of adjective-noun plausibility. In Ninth Conference of the European Chapter of the Association for Computational Linguistics, pages 30–36, Bergen, Norway. Association for Computational Linguistics.
- Causal estimation of memorisation profiles.
- Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4016–4028, Online. Association for Computational Linguistics.
- Why we need a gradient approach to word order. Linguistics, 61(4):825–883.
- Tal Linzen and Marco Baroni. 2021. Syntactic structure from deep learning. Annual Review of Linguistics, 7(Volume 7, 2021):195–212.
- Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4:521–535.
- Maryellen C MacDonald. 2013. How language production shapes language form and comprehension. Frontiers in psychology, 4:226.
- Studying word order through iterative shuffling. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10351–10366, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- J.E. Martin. 1969. Some competence-process relationships in noun phrases with prenominal and postnominal adjectives. Journal of Verbal Learning and Verbal Behavior, 8(4):471–480.
- Rebecca Marvin and Tal Linzen. 2018. Targeted syntactic evaluation of language models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1192–1202, Brussels, Belgium. Association for Computational Linguistics.
- Evaluating n𝑛nitalic_n-gram novelty of language models using rusty-dawg.
- Kanishka Misra and Kyle Mahowald. 2024. Language models learn rare phenomena from less rare phenomena: The case of the missing aanns. CoRR, abs/2403.19827.
- James W. Ney. 1981. Optionality and choice in the selection of verb complements in english. WORD, 32(2):133–152.
- Frequency explains the inverse correlation of large language models’ size, training data amount, and surprisal’s fit to reading times. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2644–2663, St. Julian’s, Malta. Association for Computational Linguistics.
- Timothy J. O’Donnell. 2015. Productivity and Reuse in Language: A Theory of Linguistic Computation and Storage. The MIT Press.
- Filtered corpus training (fict) shows that language models can generalize from indirect evidence.
- Recite, reconstruct, recollect: Memorization in lms as a multifaceted phenomenon.
- Gregory Scontras. 2023. Adjective ordering across languages. Annual Review of Linguistics, 9:357–376.
- On the grammatical source of adjective ordering preferences. Semantics and Pragmatics, 12:1–21.
- Subjectivity Predicts Adjective Ordering Preferences. Open Mind, 1(1):53–66.
- Gary-John Scott. 2002. Stacked Adjectival Modification and the Structure of Nominal Phrases.
- Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2888–2913, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Richard Sproat and Chilin Shih. 1991. The Cross-Linguistic Distribution of Adjective Ordering Restrictions. Springer Netherlands, Dordrecht.
- The learnability of the wh-island constraint in Dutch by a long short-term memory network. In Proceedings of the Society for Computation in Linguistics 2023, pages 321–331, Amherst, MA. Association for Computational Linguistics.
- H. Sweet. 1898. A New English Grammar, Logical and Historical. Number pt. 2 in A New English Grammar, Logical and Historical. Clarendon Press.
- Alexandra Teodorescu. 2006. Adjective ordering restrictions revisited. In 25th West Coast Conference on Formal Linguistics, pages 399–407. Cascadilla Proceedings Project.
- Andreas Trotzke and Eva Wittenberg. 2019. Long-standing issues in adjective order and corpus evidence for a multifactorial approach. Linguistics, 57(2):273–282.
- Robert Truswell. 2009. Attributive adjectives and nominal templates. Linguistic Inquiry, 40(3):525–533.
- The birth of bias: A case study on the evolution of gender bias in an English language model. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 75–75, Seattle, Washington. Association for Computational Linguistics.
- Spicy adjectives and nominal donkeys: Capturing semantic deviance using compositionality in distributional spaces. Cognitive science, 41 1:102–136.
- Studying the recursive behaviour of adjectival modification with compositional distributional semantics. In Conference on Empirical Methods in Natural Language Processing.
- Z. Vendler. 1968. Adjectives and Nominalizations. Papers on formal linguistics. Mouton.
- Findings of the BabyLM challenge: Sample-efficient pretraining on developmentally plausible corpora. In Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 1–34, Singapore. Association for Computational Linguistics.
- BLiMP: The benchmark of linguistic minimal pairs for English. Transactions of the Association for Computational Linguistics, 8:377–392.
- What do RNN language models learn about filler–gap dependencies? In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 211–221, Brussels, Belgium. Association for Computational Linguistics.
- Michael Wilson and Robert Frank. 2023. Inductive bias is in the eye of the beholder. In Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, pages 152–162, Singapore. Association for Computational Linguistics.
- Stefanie Wulff. 2003. A multifactorial corpus analysis of adjective order in english. International Journal of Corpus Linguistics, 8:245–282.
- Stefanie Wulff and Stefan Gries. 2015. Prenominal adjective order preferences in chinese and german l2 english: A multifactorial corpus study. Linguistic Approaches to Bilingualism, 5.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.