Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics (2404.03301v1)
Abstract: Scalar adjectives pertain to various domain scales and vary in intensity within each scale (e.g. certain is more intense than likely on the likelihood scale). Scalar implicatures arise from the consideration of alternative statements which could have been made. They can be triggered by scalar adjectives and require listeners to reason pragmatically about them. Some scalar adjectives are more likely to trigger scalar implicatures than others. This phenomenon is referred to as scalar diversity. In this study, we probe different families of LLMs such as GPT-4 for their knowledge of the lexical semantics of scalar adjectives and one specific aspect of their pragmatics, namely scalar diversity. We find that they encode rich lexical-semantic information about scalar adjectives. However, the rich lexical-semantic knowledge does not entail a good understanding of scalar diversity. We also compare current models of different sizes and complexities and find that larger models are not always better. Finally, we explain our probing results by leveraging linguistic intuitions and model training objectives.
- The falcon series of open language models. arXiv preprint arXiv:2311.16867.
- Scaling Instruction-Finetuned Language Models. arXiv preprint arXiv:2210.11416.
- Learning Scalar Adjective Intensity from Paraphrases. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1752–1762, Brussels, Belgium. Association for Computational Linguistics.
- “Was It Good? It Was Provocative.” Learning the Meaning of Scalar Adjectives. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 167–176, Uppsala, Sweden. Association for Computational Linguistics.
- Gerard de Melo and Mohit Bansal. 2013. Good, Great, Excellent: Global Inference of Semantic Intensities. Transactions of the Association for Computational Linguistics, 1:279–290.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Time Machine GPT. In NAACL 2024 Findings.
- Allyson Ettinger. 2020. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models. Transactions of the Association for Computational Linguistics, 8:34–48.
- Understanding particularized and generalized conversational implicatures: Is theory-of-mind necessary? Brain and Language, 212:104878.
- Aina Garí Soler and Marianna Apidianaki. 2020. BERT Knows Punta Cana is not just beautiful, it’s gorgeous: Ranking Scalar Adjectives with Contextualised Representations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7371–7385, Online. Association for Computational Linguistics.
- Nicole Gotzner and Diana Mazzarella. 2021. Face Management and Negative Strengthening: The Role of Power Relations, Social Distance, and Gender. Frontiers in psychology, 12:602977.
- Scalar Diversity, Negative Strengthening, and Adjectival Semantics. Frontiers in Psychology, 9.
- Herbert P Grice. 1975. Logic and Conversation. In Speech acts, pages 41–58. Brill.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
- Expectations over Unspoken Alternatives Predict Pragmatic Inferences. Transactions of the Association for Computational Linguistics, 11:885–901.
- Predicting scalar diversity with context-driven uncertainty over alternatives. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 68–74, Dublin, Ireland. Association for Computational Linguistics.
- Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8690–8705, Online. Association for Computational Linguistics.
- Hans Kamp. 1975. Two theories about adjectives. In Formal Semantics of Natural Language. Cambridge University Press.
- Joo-Kyung Kim and Marie-Catherine de Marneffe. 2013. Deriving adjectival scales from continuous space word representations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1625–1630.
- Code Simulation Challenges for Large Language Models. arXiv preprint arXiv:2401.09074.
- Graph-enhanced Large Language Models in Asynchronous Plan Reasoning. arXiv preprint arXiv:2402.02805.
- Syntactic annotations for the Google Books NGram corpus. In Proceedings of the ACL 2012 System Demonstrations, pages 169–174, Jeju Island, Korea. Association for Computational Linguistics.
- We’re Afraid Language Models Aren’t Modeling Ambiguity. arXiv preprint arXiv:2304.14399.
- Adjective Scale Probe: Can Language Models Encode Formal Semantics Information? Proceedings of the AAAI Conference on Artificial Intelligence, 37(11):13282–13290.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.
- Isabelle Lorge and Janet Pierrehumbert. 2023. Not Wacky vs. Definitely Wacky: A Study of Scalar Adverbs in Pretrained Language Models. pages 296–316.
- context2vec: Learning Generic Context Embedding with Bidirectional LSTM. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pages 51–61, Berlin, Germany. Association for Computational Linguistics.
- Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment. arXiv preprint arXiv:2402.13956.
- Advances in Pre-Training Distributed Word Representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018).
- Elizabeth Pankratz and Bob van Tiel. 2021. The role of relevance for scalar diversity: a usage-based approach. Language and Cognition, 13(4):562–594.
- Carita Paradis. 1998. Degree Modifiers of Adjectives in Spoken British English.
- Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China. Association for Computational Linguistics.
- Christopher Potts. 2015. Presupposition and Implicature. The handbook of contemporary semantic theory, pages 168–202.
- Language Models Are Unsupervised Multitask Learners. OpenAI blog, 1(8):9.
- Eszter Ronai and Ming Xiang. 2022. Three Factors in Explaining Scalar Diversity. In Proceedings of Sinn und Bedeutung, volume 26, pages 716–733.
- Eszter Ronai and Ming Xiang. 2023a. Degree estimates as a measure of inference calculation. Proceedings of the Linguistic Society of America, 8:5537.
- Eszter Ronai and Ming Xiang. 2023b. Memory Versus Expectation: Processing Relative Clauses in a Flexible Word Order Language. Cognitive Science, 47(1):e13227.
- The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs. Advances in Neural Information Processing Systems, 36.
- Masked Language Model Scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2699–2712, Online. Association for Computational Linguistics.
- Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3762–3780, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Harnessing the Linguistic Signal to Predict Scalar Inferences. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5387–5403, Online. Association for Computational Linguistics.
- Corpus-Based Discovery of Semantic Intensity Scales. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–493.
- Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.
- LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971.
- Scalar Diversity. Journal of Semantics, 33(1):137–175.
- Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 37–42, Florence, Italy. Association for Computational Linguistics.
- Chain-of-thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35:24824–24837.
- The better your Syntax, the better your Semantics? Probing Pretrained Language Models for the English Comparative Correlative. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10859–10882, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv preprint arXiv:2302.11382.
- Bryan Wilkinson and Oates Tim. 2016. A Gold Standard for Scalar Adjectives. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 2669–2675, Portorož, Slovenia. European Language Resources Association (ELRA).
- A Broad-Coverage Challenge Corpus for Sentence Understanding Through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
- Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Calibrate Before Use: Improving Few-shot Performance of Language Models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
- Factual Probing Is [MASK]: Learning vs. Learning to Recall. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5017–5033, Online. Association for Computational Linguistics.
- Learning Scalar Adjective Intensity from Paraphrases. Association for Computational Linguistics. PID https://github.com/acocos/scalar-adj/.
- Good, Great, Excellent: Global Inference of Semantic Intensities. MIT Press. PID http://demelo.org/gdm/intensity/.
- PPDB: The Paraphrase Database. Association for Computational Linguistics. PID http://paraphrase.org/.
- BERT Knows Punta Cana is not just beautiful, it’s gorgeous: Ranking Scalar Adjectives with Contextualised Representations. Association for Computational Linguistics. PID https://github.com/ainagari/scalar_adjs/tree/master/data.
- Scalar Diversity, Negative Strengthening, and Adjectival Semantics. PID https://github.com/jennhu/expectations-over-alternatives/blob/master/cross-scale/human_data/g18.csv.
- Adjectives in WordNet. Oxford University Press, ISLRN 379-473-059-273-1.
- The role of relevance for scalar diversity: a usage-based approach. Cambridge University Press. PID https://github.com/jennhu/expectations-over-alternatives/blob/master/cross-scale/human_data/pvt21.csv.
- PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. Association for Computational Linguistics. PID http://paraphrase.org/.
- Three factors in explaining scalar diversity. PID https://github.com/jennhu/expectations-over-alternatives/blob/master/cross-scale/human_data/rx22.csv.
- A Gold Standard for Scalar Adjectives. European Language Resources Association (ELRA), ISLRN 691-938-401-573-9.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.