Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment (2402.13956v3)
Abstract: Do LMs infer the semantics of text from co-occurrence patterns in their training data? Merrill et al. (2022) argue that, in theory, sentence co-occurrence probabilities predicted by an optimal LM should reflect the entailment relationship of the constituent sentences, but it is unclear whether probabilities predicted by neural LMs encode entailment in this way because of strong assumptions made by Merrill et al. (namely, that humans always avoid redundancy). In this work, we investigate whether their theory can be used to decode entailment relations from neural LMs. We find that a test similar to theirs can decode entailment relations between natural sentences, well above random chance, though not perfectly, across many datasets and LMs. This suggests LMs implicitly model aspects of semantics to predict semantic effects on sentence co-occurrence patterns. However, we find the test that predicts entailment in practice works in the opposite direction to the theoretical test. We thus revisit the assumptions underlying the original test, finding its derivation did not adequately account for redundancy in human-written text. We argue that better accounting for redundancy related to explanations might derive the observed flipped test and, more generally, improve computational models of speakers in linguistics.
- Emily M. Bender and Alexander Koller. 2020. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198, Online. Association for Computational Linguistics.
- Robert Brandom. 2000. Articulating Reasons: An Introduction to Inferentialism. Harvard University Press, Cambridge, Mass.
- Mikael Brunila and Jack LaViolette. 2022. What company do words keep? revisiting the distributional semantics of J.R. firth & zellig Harris. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4403–4417, Seattle, United States. Association for Computational Linguistics.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Recognizing textual entailment: Rational, evaluation and approaches–erratum. Natural Language Engineering, 16(1):105–105.
- When redundancy is useful: A bayesian approach to ’overinformative’ referring expressions.
- Multi-news: a large-scale multi-document summarization dataset and abstractive hierarchical model.
- The pile: An 800gb dataset of diverse text for language modeling.
- Noah D. Goodman and Michael C. Frank. 2016. Pragmatic language interpretation as probabilistic inference. Trends in Cognitive Sciences, 20(11):818–829.
- Think before you speak: Training language models with pause tokens.
- Herbert P Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
- Zellig S Harris. 1954. Distributional structure. Word, 10(2-3):146–162.
- Philip J. Hayes and Steven P. Weinstein. 1991. CONSTRUE/TIS: A system for content-based indexing of a database of news stories. In Proceedings of the 2nd Conference on Innovative Applications of Artificial Intelligence (IAAI-90), May 1-3, 1990, Washington, DC, USA, pages 49–64. AAAI Press, Chicago, IL, USA.
- TRUE: Re-evaluating factual consistency evaluation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3905–3920, Seattle, United States. Association for Computational Linguistics.
- WANLI: Worker and AI collaboration for natural language inference dataset creation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6826–6847, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand? Transactions of the Association for Computational Linguistics, 9:1047–1060.
- Entailment semantics can be extracted from an ideal language model. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 176–193, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Julian Michael. 2020. To dissect an octopus: Making sense of the form/meaning debate.
- Adversarial NLI: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4885–4901, Online. Association for Computational Linguistics.
- In-context learning and induction heads. arXiv preprint arXiv:2209.11895.
- Ellie Pavlick. 2022. Semantic structure in deep learning. Annual Review of Linguistics, 8(1):447–471.
- Christopher Potts. 2020. Is it possible for language models to achieve understanding?
- Language models are unsupervised multitask learners.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1).
- Claude E Shannon. 1951. Prediction and entropy of printed english. Bell system technical journal, 30(1):50–64.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Johan Van Benthem. 1986. Natural Logic, pages 109–119. Springer Netherlands, Dordrecht.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
- Transparency helps reveal when language models learn meaning. Transactions of the Association for Computational Linguistics, 11:617–634.
- Opt: Open pre-trained transformer language models.
- Character-level convolutional networks for text classification.
- William Merrill (36 papers)
- Zhaofeng Wu (21 papers)
- Norihito Naka (2 papers)
- Yoon Kim (92 papers)
- Tal Linzen (73 papers)