Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 18 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 97 tok/s
GPT OSS 120B 451 tok/s Pro
Kimi K2 212 tok/s Pro
2000 character limit reached

Perceptions of Linguistic Uncertainty by Language Models and Humans (2407.15814v2)

Published 22 Jul 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Uncertainty expressions such as "probably" or "highly unlikely" are pervasive in human language. While prior work has established that there is population-level agreement in terms of how humans quantitatively interpret these expressions, there has been little inquiry into the abilities of LLMs in the same context. In this paper, we investigate how LLMs map linguistic expressions of uncertainty to numerical responses. Our approach assesses whether LLMs can employ theory of mind in this setting: understanding the uncertainty of another agent about a particular statement, independently of the model's own certainty about that statement. We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner. However, we observe systematically different behavior depending on whether a statement is actually true or false. This sensitivity indicates that LLMs are substantially more susceptible to bias based on their prior knowledge (as compared to humans). These findings raise important questions and have broad implications for human-AI and AI-AI communication.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. GPT4 Technical Report, 2024.
  2. Using large language models to simulate multiple humans and replicate human subject studies. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
  3. Semantically diverse language generation for uncertainty estimation in language models, 2024.
  4. Gemini: A family of highly capable multimodal models, 2024.
  5. Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3):337–351, 2023. doi: 10.1017/pan.2023.2.
  6. Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
  7. LLMs’ reading comprehension is affected by parametric knowledge and struggles with hypothetical statements. arXiv preprint arXiv:2404.06283, 2024.
  8. Open llm leaderboard. https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard, 2023.
  9. Tactful or doubtful?: Expectations of politeness explain the severity bias in the interpretation of probability phrases. Psychological Science, 17(9):747–751, 2006. doi: 10.1111/j.1467-9280.2006.01776.x. URL https://doi.org/10.1111/j.1467-9280.2006.01776.x. PMID: 16984289.
  10. Verbal probabilities: Ambiguous, context-dependent, or both? Organizational Behavior and Human Decision Processes, 41(3):390–404, June 1988. ISSN 0749-5978. doi: 10.1016/0749-5978(88)90036-2. URL http://dx.doi.org/10.1016/0749-5978(88)90036-2.
  11. Consistency in interpretation of probabilistic phrases. Organizational behavior and human decision processes, 36(3):391–405, 1985.
  12. The interpretation of ipcc probabilistic statements around the world. Nature Climate Change, 4(6):508–512, June 2014. ISSN 1758-6798. doi: 10.1038/nclimate2194. URL https://www.nature.com/articles/nclimate2194.
  13. Quantifying uncertainty in answers from any language model and enhancing their trustworthiness, 2023.
  14. Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457, 2018. URL https://api.semanticscholar.org/CorpusID:3922816.
  15. Ratings of orally presented verbal expressions of probability by a heterogeneous sample. Journal of Applied Social Psychology, 22(8):638–656, 1992.
  16. Chapter three - communicating and reasoning with verbal probability expressions. volume 69 of Psychology of Learning and Motivation, pages 67–105. Academic Press, 2018. doi: https://doi.org/10.1016/bs.plm.2018.10.003. URL https://www.sciencedirect.com/science/article/pii/S0079742118300227.
  17. Communicating uncertainty using words and numbers. Trends in Cognitive Sciences, 26(6):514–526, June 2022. ISSN 1364-6613. doi: 10.1016/j.tics.2022.03.002. URL http://dx.doi.org/10.1016/j.tics.2022.03.002.
  18. Interpersonal comparison of subjective probabilities: Toward translating linguistic probabilities. Memory & Cognition, 33(6):1057–1068, September 2005. ISSN 1532-5946. doi: 10.3758/bf03193213. URL http://dx.doi.org/10.3758/BF03193213.
  19. Can ai language models replace human participants? Trends in Cognitive Sciences, 27:597–600, 2023. URL https://api.semanticscholar.org/CorpusID:258569852.
  20. Shifting attention to relevance: Towards the predictive uncertainty quantification of free-form large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, Aug 2024. Association for Computational Linguistics.
  21. Misty C. Duke. Probability and confidence: How to improve communication of uncertainty about uncertainty in intelligence analysis. Journal of Behavioral Decision Making, 37(1), November 2023. ISSN 1099-0771. doi: 10.1002/bdm.2364. URL http://dx.doi.org/10.1002/bdm.2364.
  22. Verbal versus numerical probabilities: Efficiency, biases, and the preference paradox. Organizational Behavior and Human Decision Processes, 45(1):1–18, February 1990. ISSN 0749-5978. doi: 10.1016/0749-5978(90)90002-q. URL http://dx.doi.org/10.1016/0749-5978(90)90002-Q.
  23. Wade Fagen-Ulmschneider. Perception of probability words, Nov 2019. URL https://waf.cs.illinois.edu/visualizations/Perception-of-Probability-Words/. Accessed: [June 12, 2024].
  24. Towards effective management of verbal probability expressions using a co-learning approach. In HHAI 2024: Hybrid Human AI Systems for the Social Good, pages 124–133. IOS Press, 2024.
  25. Joe Fore. "a court would likely (60-75%) find…." defining verbal probability expressions in predictive legal analysis. Legal Comm. & Rhetoric: JAWLD, 16:49, 2019.
  26. The challenge of using llms to simulate human behavior: A causal inference perspective. ArXiv, abs/2312.15524, 2023. URL https://api.semanticscholar.org/CorpusID:266133974.
  27. Improving the communication of uncertainty in climate science and intelligence analysis. Behavioral Science & Policy, 1(2):43–55, 2015. doi: 10.1177/237946151500100206. URL https://doi.org/10.1177/237946151500100206.
  28. Decomposing uncertainty for large language models through input clarification ensembling, 2024.
  29. Culturally aware natural language inference. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7591–7609, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.509. URL https://aclanthology.org/2023.findings-emnlp.509.
  30. Mistral 7b, 2023.
  31. How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977, 2021. doi: 10.1162/tacl_a_00407. URL https://aclanthology.org/2021.tacl-1.57.
  32. Do people really say it is “likely” when they believe it is only “possible”? effect of politeness on risk communication. Quarterly Journal of Experimental Psychology, 66(7):1268–1275, 2013. doi: 10.1080/17470218.2013.804582. URL https://doi.org/10.1080/17470218.2013.804582. PMID: 23782394.
  33. You say “probable” and i say “likely”: Improving interpersonal communication with verbal probability phrases. Journal of Experimental Psychology: Applied, 10(1):25–41, 2004. ISSN 1076-898X. doi: 10.1037/1076-898x.10.1.25. URL http://dx.doi.org/10.1037/1076-898X.10.1.25.
  34. "i’m not sure, but…": Examining the impact of large language models’ uncertainty expression on user reliance and trust. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, page 822–835, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400704505. doi: 10.1145/3630106.3658941. URL https://doi.org/10.1145/3630106.3658941.
  35. Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, pages 12–24, 2023.
  36. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In Proceedings of the 11th International Conference on Learning Representations, ICLR’23, 2023.
  37. Summary of a haystack: A challenge to long-context llms and rag systems, 2024. URL https://arxiv.org/abs/2407.01370.
  38. Advances, challenges and opportunities in creating data for trustworthy AI. Nature Machine Intelligence, 4(8):669–677, Aug 2022.
  39. Teaching models to express their uncertainty in words, 2022.
  40. Generating with confidence: Uncertainty quantification for black-box large language models, 2024.
  41. Entity-based knowledge conflicts in question answering. arXiv preprint arXiv:2109.05052, 2021.
  42. A pretrainer’s guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity. arXiv preprint arXiv:2305.13169, 2023.
  43. A comparison of human and gpt-4 use of probabilistic phrases in a coordination game. Scientific Reports, 14(1), March 2024. ISSN 2045-2322. doi: 10.1038/s41598-024-56740-9. URL http://dx.doi.org/10.1038/s41598-024-56740-9.
  44. Reducing conversational agents’ overconfidence through linguistic calibration. Transactions of the Association for Computational Linguistics, 10:857–872, 2022. doi: 10.1162/tacl_a_00494. URL https://aclanthology.org/2022.tacl-1.50.
  45. More human than human: measuring chatgpt political bias. Public Choice, 198(1–2):3–23, August 2023. ISSN 1573-7101. doi: 10.1007/s11127-023-01097-2. URL http://dx.doi.org/10.1007/s11127-023-01097-2.
  46. More human than human: Measuring chatgpt political bias. Public Choice, 198(1):3–23, 2024.
  47. Using llms to model the beliefs and preferences of targeted populations. ArXiv, abs/2403.20252, 2024. URL https://api.semanticscholar.org/CorpusID:268793624.
  48. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23, New York, NY, USA, 2023a. Association for Computing Machinery. ISBN 9798400701320. doi: 10.1145/3586183.3606763. URL https://doi.org/10.1145/3586183.3606763.
  49. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23, New York, NY, USA, 2023b. Association for Computing Machinery. ISBN 9798400701320. doi: 10.1145/3586183.3606763. URL https://doi.org/10.1145/3586183.3606763.
  50. Communicating uncertainty: lessons learned and suggestions for climate change assessment. Comptes rendus. Géoscience, 337(4):425–441, 2005.
  51. Chatdev: Communicative agents for software development, 2024. URL https://arxiv.org/abs/2307.07924.
  52. In-context impersonation reveals large language models’ strengths and biases. Advances in Neural Information Processing Systems, 36, 2024.
  53. Whose opinions do language models reflect?, 2023.
  54. Neural theory-of-mind? on the limits of social intelligence in large lms. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3762–3780, 2022.
  55. Communicating risk of medication side-effects: role of communication format on risk perception. Pharmacy Practice, 16(2):1174, June 2018. ISSN 1886-3655. doi: 10.18549/pharmpract.2018.02.1174. URL http://dx.doi.org/10.18549/PharmPract.2018.02.1174.
  56. Evaluating the moral beliefs encoded in llms. Advances in Neural Information Processing Systems, 36, 2024.
  57. The tail wagging the dog: Dataset construction biases of social bias benchmarks. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1373–1386, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-short.118. URL https://aclanthology.org/2023.acl-short.118.
  58. Quantifying social biases using templates is unreliable, 2022.
  59. Assisting in writing wikipedia-like articles from scratch with large language models, 2024. URL https://arxiv.org/abs/2402.14207.
  60. Llamas know what gpts don’t show: Surrogate models for confidence estimation. ArXiv, abs/2311.08877, 2023. URL https://api.semanticscholar.org/CorpusID:265213392.
  61. Re-examining calibration: The case of question answering. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2814–2829, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-emnlp.204. URL https://aclanthology.org/2022.findings-emnlp.204.
  62. Probing neural language models for understanding of words of estimative probability. In Alexis Palmer and Jose Camacho-collados, editors, Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), pages 469–476, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.starsem-1.41. URL https://aclanthology.org/2023.starsem-1.41.
  63. Aaditya K. Singh and DJ Strouse. Tokenization counts: the impact of tokenization on arithmetic in frontier llms, 2024.
  64. The calibration gap between model and human confidence in large language models, 2024.
  65. Llms achieve adult human performance on higher-order theory of mind tasks. arXiv preprint arXiv:2405.18870, 2024.
  66. An evaluation of estimative uncertainty in large language models, 2024. URL https://arxiv.org/abs/2405.15185.
  67. Karl Halvor Teigen. Dimensions of uncertainty communication: What is conveyed by verbal terms and numeric ranges. Current Psychology, 42(33):29122–29137, 2023.
  68. Karl Halvor Teigen. Dimensions of uncertainty communication: What is conveyed by verbal terms and numeric ranges. Current Psychology, 42(33):29122–29137, 2023.
  69. Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5433–5442, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.330. URL https://aclanthology.org/2023.emnlp-main.330.
  70. Two dimensions of subjective uncertainty: Clues from natural language. Journal of experimental psychology: General, 145(10):1280, 2016.
  71. Communicating uncertainty about facts, numbers and science. Royal Society open science, 6(5):181870, 2019.
  72. Theory of mind abilities of large language models in human-robot interaction: An illusion? In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’24, page 36–45, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400703232. doi: 10.1145/3610978.3640767. URL https://doi.org/10.1145/3610978.3640767.
  73. Measuring the vague meanings of probability terms. Journal of Experimental Psychology: General, 115(4):348, 1986a.
  74. Base rate effects on the interpretations of probability and frequency expressions. Journal of Memory and Language, 25(5):571–587, 1986b. ISSN 0749-596X. doi: https://doi.org/10.1016/0749-596X(86)90012-4. URL https://www.sciencedirect.com/science/article/pii/0749596X86900124.
  75. Preferences and reasons for communicating probabilistic information in verbal or numerical terms. Bulletin of the Psychonomic Society, 31(2):135–138, February 1993. ISSN 0090-5054. doi: 10.3758/bf03334162. URL http://dx.doi.org/10.3758/BF03334162.
  76. Exploring intelligence analysts’ selection and interpretation of probability terms: Final report for research contract ‘expressing probability in intelligence analysis’. 2008.
  77. “kelly is a warm person, Joseph is a role model”: Gender biases in llm-generated reference letters. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3730–3748, 2023.
  78. Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks, 2022.
  79. Self-instruct: Aligning language models with self-generated instructions. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.754. URL https://aclanthology.org/2023.acl-long.754.
  80. Contextual effects in the interpretations of probability words: Perceived base rate and severity of events. Journal of Experimental Psychology: Human Perception and Performance,, 16, 11 1990. doi: 10.1037/0096-1523.16.4.781.
  81. Variability in the interpretation of dutch probability phrases - a risk for miscommunication. arXiv: Other Statistics, 2019. URL https://api.semanticscholar.org/CorpusID:88518711.
  82. Measuring psychological uncertainty: Verbal versus numeric methods. Journal of Experimental Psychology: Applied, 2(4):343, 1996.
  83. Autogen: Enabling next-gen llm applications via multi-agent conversation, 2023.
  84. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. In Proceedings of the 12th International Conference on Learning Representations, ICLR’24, 2024.
  85. Foreign language effect in accounting uncertainty expressions: Interpretation and probabilistic estimation. Journal of International Accounting, Auditing and Taxation, 50:100528, March 2023. ISSN 1061-9518. doi: 10.1016/j.intaccaudtax.2023.100528. URL http://dx.doi.org/10.1016/j.intaccaudtax.2023.100528.
  86. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023. URL https://arxiv.org/abs/2306.05685.
  87. Relying on the unreliable: The impact of language models’ reluctance to express uncertainty. ArXiv, abs/2401.06730, 2024. URL https://api.semanticscholar.org/CorpusID:266977353.
  88. How far are large language models from agents with theory-of-mind? arXiv preprint arXiv:2310.03051, 2023.
Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube