Papers
Topics
Authors
Recent
2000 character limit reached

Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST! (2402.12486v2)

Published 19 Feb 2024 in cs.CL

Abstract: In everyday language use, speakers frequently utter and interpret sentences that are semantically underspecified, namely, whose content is insufficient to fully convey their message or interpret them univocally. For example, to interpret the underspecified sentence "Don't spend too much", which leaves implicit what (not) to spend, additional linguistic context or outside knowledge is needed. In this work, we propose a novel Dataset of semantically Underspecified Sentences grouped by Type (DUST) and use it to study whether pre-trained LMs correctly identify and interpret underspecified sentences. We find that newer LMs are reasonably able to identify underspecified sentences when explicitly prompted. However, interpreting them correctly is much harder for any LMs. Our experiments show that when interpreting underspecified sentences, LMs exhibit little uncertainty, contrary to what theoretical accounts of underspecification would predict. Overall, our study reveals limitations in current models' processing of sentence semantics and highlights the importance of using naturalistic data and communicative scenarios when evaluating LMs' language capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Do You See What I Mean? Visual Resolution of Linguistic Ambiguities. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1477–1487, Lisbon, Portugal. Association for Computational Linguistics.
  2. Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3):904–911.
  3. Scaling Instruction-Finetuned Language Models. ArXiv:2210.11416 [cs].
  4. Veena Dwivedi. 2013. Interpreting quantifier scope ambiguity: Evidence of heuristic first, algorithmic second processing. PLoS ONE, 8.
  5. Markus Egg. 2010. Semantic Underspecification. Language and Linguistics Compass, 4(3):166–181.
  6. Allyson Ettinger. 2020. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models. Transactions of the Association for Computational Linguistics, 8:34–48.
  7. Rule-based NLP vs ChatGPT in Ambiguity Detection, a Preliminary Study. In Joint Proceedings of REFSQ-2023 Workshops, Barcelona.
  8. Francesca Franzon and Chiara Zanini. 2022. The Entropy of Morphological Systems in Natural Languages Is Modulated by Functional and Semantic Properties. Journal of Quantitative Linguistics, 30(1):42–66.
  9. Steven Frisson. 2009. Semantic Underspecification in Language Processing. Language and Linguistics Compass, 3(1):111–127.
  10. H. P. Grice. 1969. Utterer’s meaning and intention. The Philosophical Review, 78(2):147–177.
  11. When language models fall in love: Animacy processing in transformer language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12120–12135, Singapore. Association for Computational Linguistics.
  12. Daniel W. Harris. 2020. What Makes Human Communication Special? In Unpublished book manuscript. CUNY Graduate Center. Draft of October 27, 2020.
  13. Jennifer Hu and Roger Levy. 2023. Prompting is not a substitute for probability measurements in large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5040–5060, Singapore. Association for Computational Linguistics.
  14. Underspecification in Scene Description-to-Depiction Tasks. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1172–1184, Online only. Association for Computational Linguistics.
  15. Mistral 7b.
  16. Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4):978–990.
  17. Howard S. Kurtzman and Maryellen C. MacDonald. 1993. Resolution of quantifier scope ambiguities. Cognition, 48:243–279.
  18. Shalom Lappin. 2000. An Intensional Parametric Semantics for Vague Quantifiers. Linguistics and Philosophy, 23(6):599–620.
  19. The Winograd Schema Challenge. In Proceeding of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, pages 552–561. Institute of Electrical and Electronics Engineers Inc.
  20. Stephen C. Levinson. 2000. Presumptive meanings: the theory of generalized conversational implicature. The MIT Press. OCLC: 956673720.
  21. We’re Afraid Language Models Aren’t Modeling Ambiguity. ArXiv:2304.14399 [cs].
  22. Greg Maciejewski and Ekaterini Klepousniotou. 2016. Relative Meaning Frequencies for 100 Homonyms: British eDom Norms. Journal of Open Psychology Data, 4:e6.
  23. State of what art? a call for multi-prompt llm evaluation.
  24. Subhabrata Mukherjee and Pushpak Bhattacharyya. 2012. Wikisent: Weakly supervised sentiment analysis through extractive summarization with wikipedia. In Machine Learning and Knowledge Discovery in Databases, pages 774–793, Berlin, Heidelberg. Springer Berlin Heidelberg.
  25. A Uniform Approach to Underspecification and Parallelism. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 410–417, Madrid, Spain. Association for Computational Linguistics.
  26. Peter Norvig. 2009. Natural language corpus data. Beautiful data, pages 219–242.
  27. Linguistic ambiguity analysis in ChatGPT. ArXiv:2302.06426 [cs].
  28. Sandro Pezzelle. 2023. Dealing with Semantic Underspecification in Multimodal NLP. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12098–12112, Toronto, Canada. Association for Computational Linguistics.
  29. The communicative function of ambiguity in language. Cognition, 122(3):280–291.
  30. Manfred Pinkal. 1999. On Semantic Underspecification. In Harry Bunt and Reinhard Muskens, editors, Computing Meaning: Volume 1, Studies in Linguistics and Philosophy, pages 33–55. Springer Netherlands, Dordrecht.
  31. Massimo Poesio. 1994. Ambiguity, Underspecification and Discourse Interpretation. In Proceedings of the First International Workshop on Computational Semantics.
  32. Rephrase, augment, reason: Visual grounding of questions for vision-language models.
  33. Learning transferable visual models from natural language supervision.
  34. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  35. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8:842–866.
  36. SemEval-2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1039–1049, Seattle, United States. Association for Computational Linguistics.
  37. Adam Sennet. 2023. Ambiguity. In Edward N. Zalta and Uri Nodelman, editors, The Stanford Encyclopedia of Philosophy, spring 2023 edition. Metaphysics Research Lab, Stanford University.
  38. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
  39. Zero and few-shot semantic parsing with ambiguous inputs.
  40. Alberto Testoni and Raquel Fernández. 2024. Asking the right question at the right time: Human and model uncertainty guidance to ask clarification questions.
  41. Llama 2: Open foundation and fine-tuned chat models.
  42. Wolfgang Wahlster. 2000. Mobile Speech-to-Speech Translation of Spontaneous Dialogs: An Overview of the Final Verbmobil System. In Wolfgang Wahlster, editor, Verbmobil: Foundations of Speech-to-Speech Translation, pages 3–21. Springer Berlin Heidelberg, Berlin, Heidelberg.
  43. Are language models worse than humans at following prompts? it’s complicated. In The 2023 Conference on Empirical Methods in Natural Language Processing.
  44. Albert Webson and Ellie Pavlick. 2022. Do prompt-based models really understand the meaning of their prompts? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2300–2344, Seattle, United States. Association for Computational Linguistics.
  45. Huggingface’s transformers: State-of-the-art natural language processing.
  46. OPT: Open Pre-trained Transformer Language Models.
  47. Wanzheng Zhu and Suma Bhat. 2020. GRUEN for Evaluating Linguistic Quality of Generated Text. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 94–108, Online. Association for Computational Linguistics.
  48. Arnold M. Zwicky and Jerrold M. Sadock. 1975. Ambiguity tests and how to fail them. Syntax and Semantics, 4:1–36.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.