Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Probabilistic Reasoning in Generative Large Language Models (2402.09614v2)

Published 14 Feb 2024 in cs.CL and cs.AI

Abstract: This paper considers the challenges LLMs face when reasoning over text that includes information involving uncertainty explicitly quantified via probability values. This type of reasoning is relevant to a variety of contexts ranging from everyday conversations to medical decision-making. Despite improvements in the mathematical reasoning capabilities of LLMs, they still exhibit significant difficulties when it comes to probabilistic reasoning. To deal with this problem, we introduce the Bayesian Linguistic Inference Dataset (BLInD), a new dataset specifically designed to test the probabilistic reasoning capabilities of LLMs. We use BLInD to find out the limitations of LLMs for tasks involving probabilistic reasoning. In addition, we present several prompting strategies that map the problem to different formal representations, including Python code, probabilistic algorithms, and probabilistic logical programming. We conclude by providing an evaluation of our methods on BLInD and an adaptation of a causal reasoning question-answering dataset. Our empirical results highlight the effectiveness of our proposed strategies for multiple LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. pgmpy: Probabilistic graphical models using python. In Proceedings of the 14th Python in Science Conference (SCIPY 2015). Citeseer, 2015.
  2. Ivan Bratko. Prolog Programming for Artificial Intelligence. Pearson Addison-Wesley, Harlow, England, 3 edition, 2000.
  3. Language models are few-shot learners, 2020.
  4. Sparks of artificial general intelligence: Early experiments with gpt-4, 2023.
  5. Training verifiers to solve math word problems, 2021.
  6. Problog: A probabilistic prolog and its application in link discovery. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI’07, page 2468–2473, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc.
  7. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
  8. Solving probability problems in natural language. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 3981–3987, 2017.
  9. Mathematical capabilities of chatgpt, 2023.
  10. Pal: Program-aided language models, 2023.
  11. Google. Google gemini ai. https://blog.google/technology/ai/google-gemini-ai/#availability, 2023.
  12. Solving math word problems by combining language models with symbolic solvers, 2023.
  13. John Heritage. Action formation and its epistemic (and other) backgrounds. Discourse Studies, 15(5):551–578, 2013.
  14. Cladder: Assessing causal reasoning in language models. 2023.
  15. It ain’t over: A multi-aspect diverse math word problem dataset. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14984–15011, Singapore, December 2023. Association for Computational Linguistics.
  16. Whose decision? negotiating epistemic and deontic rights in medical treatment decisions. Journal of Pragmatics, 78:54–69, 2015. Epistemics and Deontics in Conversational Directives.
  17. NumGLUE: A suite of fundamental yet challenging mathematical reasoning tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3505–3523, Dublin, Ireland, May 2022. Association for Computational Linguistics.
  18. Bayesian Rationality: The probabilistic approach to human reasoning. Oxford University Press, 02 2007.
  19. Précis of bayesian rationality: The probabilistic approach to human reasoning. Behavioral and Brain Sciences, 32(1):69–84, 2009.
  20. OpenAI. Gpt-4 technical report, 2023.
  21. Training language models to follow instructions with human feedback, 2022.
  22. Measuring sentence-level and aspect-level (un)certainty in science communications. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9959–10011, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
  23. Certified deductive reasoning with language models, 2023.
  24. Uncertain words, uncertain texts. perception and effects of uncertainty in biomedical communication. Acta Polytechnica Hungarica, 2019.
  25. Artificial Intelligence: A Modern Approach. Pearson, 4 edition, 2021. Global Edition.
  26. RuleBERT: Teaching soft rules to pre-trained language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1460–1476, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
  27. Stepgame: A new benchmark for robust multi-hop spatial reasoning in texts, 2022.
  28. Hybrid uncertainty quantification for selective text classification in ambiguous tasks. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11659–11681, Toronto, Canada, July 2023. Association for Computational Linguistics.
  29. Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903, 2022.
  30. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc., 2022.
  31. Language models are few-shot multilingual learners. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 1–15, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
  32. Uncertainty quantification for text classification. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 3426–3429, New York, NY, USA, 2023. Association for Computing Machinery.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Aliakbar Nafar (7 papers)
  2. Kristen Brent Venable (11 papers)
  3. Parisa Kordjamshidi (44 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets