Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Incoherent Probability Judgments in Large Language Models (2401.16646v2)

Published 30 Jan 2024 in cs.CL and cs.AI

Abstract: Autoregressive LLMs trained for next-word prediction have demonstrated remarkable proficiency at producing coherent text. But are they equally adept at forming coherent probability judgments? We use probabilistic identities and repeated judgments to assess the coherence of probability judgments made by LLMs. Our results show that the judgments produced by these models are often incoherent, displaying human-like systematic deviations from the rules of probability theory. Moreover, when prompted to judge the same event, the mean-variance relationship of probability judgments produced by LLMs shows an inverted-U-shaped like that seen in humans. We propose that these deviations from rationality can be explained by linking autoregressive LLMs to implicit Bayesian inference and drawing parallels with the Bayesian Sampler model of human probability judgments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. “GPT-4 Technical Report” In arXiv preprint arXiv:2303.08774, 2023
  2. “Using cognitive psychology to understand GPT-3” In Proceedings of the National Academy of Sciences 120.6 National Acad Sciences, 2023, pp. e2218523120
  3. “Sparks of artificial general intelligence: Early experiments with GPT-4” In arXiv preprint arXiv:2303.12712, 2023
  4. “Probabilistic biases meet the Bayesian brain” In Current Directions in Psychological Science 29.5 Sage Publications Sage CA: Los Angeles, CA, 2020, pp. 506–512
  5. “Surprisingly rational: probability theory plus noise explains biases in judgment.” In Psychological Review 121.3 American Psychological Association, 2014, pp. 463–480
  6. “Heuristic decision making” In Annual Review of Psychology 62 Annual Reviews, 2011, pp. 451–482
  7. “Bayes in the age of intelligent machines” In arXiv preprint arXiv:2311.10206, 2023
  8. John J Horton “Large language models as simulated economic agents: What can we learn from homo silicus?”, 2023
  9. “Calibrating predictive model estimates to support personalized medicine” In Journal of the American Medical Informatics Association 19.2 BMJ Group, 2012, pp. 263–274
  10. “Subjective probability: A judgment of representativeness” In Cognitive Psychology 3.3 Elsevier, 1972, pp. 430–454
  11. “Bruno: A deep recurrent model for exchangeable data” In Advances in Neural Information Processing Systems 31, 2018
  12. Ananya Kumar, Percy S Liang and Tengyu Ma “Verified uncertainty calibration” In Advances in Neural Information Processing Systems 32, 2019
  13. “An objective justification of Bayesianism I: Measuring inaccuracy” In Philosophy of Science 77.2 Cambridge University Press, 2010, pp. 201–235
  14. “Holistic evaluation of language models” In arXiv preprint arXiv:2211.09110, 2022
  15. “Testing different stochastic specificationsof risky choice” In Economica 65.260 Wiley Online Library, 1998, pp. 581–598
  16. “Embers of autoregression: Understanding large language models through the problem they are trained to solve” In arXiv preprint arXiv:2309.13638, 2023
  17. Joshua B Miller and Andrew Gelman “Laplace’s Theories of Cognitive Illusions, Heuristics and Biases” In Statistical Science 35.2, 2020, pp. 159–170
  18. “An experimental measurement of utility” In Journal of Political Economy 59.5 The University of Chicago Press, 1951, pp. 371–404
  19. “Language models are unsupervised multitask learners” In OpenAI blog 1.8, 2019, pp. 9
  20. Vaishnavi Shrivastava, Percy Liang and Ananya Kumar “Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation” In arXiv preprint arXiv:2311.08877, 2023
  21. “A unified explanation of variability and bias in human probability judgments: How computational noise explains the mean–variance signature” In Journal of Experimental Psychology. General American Psychological Association (APA), 2023, pp. 2842–2860
  22. “Llama 2: Open foundation and fine-tuned chat models” In arXiv preprint arXiv:2307.09288, 2023
  23. “Attention is all you need” In Advances in Neural Information Processing systems 30, 2017
  24. “An explanation of in-context learning as implicit Bayesian inference” In arXiv preprint arXiv:2111.02080, 2021
  25. “Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers” In Icml 1, 2001, pp. 609–616
  26. “Deep de Finetti: Recovering Topic Distributions from Large Language Models” In arXiv preprint arXiv:2312.14226, 2023
  27. Jian-Qiao Zhu, Adam N Sanborn and Nick Chater “The Bayesian sampler: Generic Bayesian inference causes incoherence in human probability judgments.” In Psychological Review 127.5 American Psychological Association, 2020, pp. 719
  28. “Clarifying the relationship between coherence and accuracy in probability judgments” In Cognition 223 Elsevier, 2022, pp. 105022
  29. “The autocorrelated Bayesian sampler: A rational process for probability judgments, estimates, confidence intervals, choices, confidence judgments, and response times.” In Psychological Review American Psychological Association, 2023
Citations (2)

Summary

We haven't generated a summary for this paper yet.