Probing neural language models for understanding of words of estimative probability
Abstract: Words of estimative probability (WEP) are expressions of a statement's plausibility (probably, maybe, likely, doubt, likely, unlikely, impossible...). Multiple surveys demonstrate the agreement of human evaluators when assigning numerical probability levels to WEP. For example, highly likely corresponds to a median chance of 0.90+-0.08 in Fagen-Ulmschneider (2015)'s survey. In this work, we measure the ability of neural language processing models to capture the consensual probability level associated to each WEP. Firstly, we use the UNLI dataset (Chen et al., 2020) which associates premises and hypotheses with their perceived joint probability p, to construct prompts, e.g. "[PREMISE]. [WEP], [HYPOTHESIS]." and assess whether LLMs can predict whether the WEP consensual probability level is close to p. Secondly, we construct a dataset of WEP-based probabilistic reasoning, to test whether LLMs can reason with WEP compositions. When prompted "[EVENTA] is likely. [EVENTB] is impossible.", a causal LLM should not express that [EVENTA&B] is likely. We show that both tasks are unsolved by off-the-shelf English LLMs, but that fine-tuning leads to transferable improvement.
- A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Uncertain natural language inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8772–8779, Online. Association for Computational Linguistics.
- Eugene Dantsin. 1992. Probabilistic logic programs and their semantics. In Logic Programming, pages 152–164, Berlin, Heidelberg. Springer Berlin Heidelberg.
- Problog: A probabilistic prolog and its application in link discovery. In IJCAI, volume 7, pages 2462–2467. Hyderabad.
- Mandeep K Dhami and David R Mandel. 2021. Words or numbers? communicating probability in intelligence analysis. American Psychologist, 76(3):549.
- Solving probability problems in natural language. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 3981–3987.
- Wade Fagen-Ulmschneider. 2015. Perception of probability words.
- Folio: Natural language reasoning with first-order logic. arXiv preprint arXiv:2209.00840.
- Sherman Kent. 1964. Words of estimative probability. Studies in intelligence, 8(4):49–65.
- How likely is that chance of thunderstorms? a study of how national weather service forecast offices use words of estimative probability and what they mean to the public. Journal of Operational Meteorology, 8(5).
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. In International Conference on Learning Representations.
- Stress test evaluation for natural language inference. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2340–2353, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- B J O’Brien. 1989. Words or numbers? the evaluation of probability expressions in general practice. The Journal of the Royal College of General Practitioners, 39 320:98–100.
- Douglas E Ott. 2021. Words representing numeric probabilities in medical writing are ambiguous and misinterpreted. JSLS: Journal of the Society of Laparoscopic & Robotic Surgeons, 25(3).
- F.R. Palmer. 1992. Words and worlds; on the linguistic analysis of modality. (european university studies, series xiv, vol. 191): Richard matthews, frankfurt am main/bern/ new york/paris, peter lang, 1991. 310 pp. sfr 76.00 (pb.). Lingua, 88(1):87–90.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Probing natural language inference models through semantic fragments. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8713–8721.
- A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8:842–866.
- Annotating and recognizing event modality in text. In Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference, Melbourne Beach, Florida, USA, May 11-13, 2006, pages 333–339. AAAI Press.
- Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, Online. Association for Computational Linguistics.
- Damien Sileo and Antoine Lernould. 2023. Mindgames: Targeting theory of mind in large language models with dynamic epistemic modal logic. arXiv preprint arXiv:2305.03353.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
- Evidentiality for text trustworthiness detection. In Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground, pages 10–17, Uppsala, Sweden. Association for Computational Linguistics.
- Mapping probability word problems to executable representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3627–3640, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Towards ai-complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
- Ordinal common-sense inference. Transactions of the Association for Computational Linguistics, 5:379–395.
- Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
- Distributed nli: Learning to predict human opinion distributions for language reasoning. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.