Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Abstract: The observed similarities in the behavior of humans and LLMs have prompted researchers to consider the potential of using LLMs as models of human cognition. However, several significant challenges must be addressed before LLMs can be legitimately regarded as cognitive models. For instance, LLMs are trained on far more data than humans typically encounter, and may have been directly trained on human data in specific cognitive tasks or aligned with human preferences. Consequently, the origins of these behavioral similarities are not well understood. In this paper, we propose a novel way to enhance the utility of LLMs as cognitive models. This approach involves (i) leveraging computationally equivalent tasks that both an LLM and a rational agent need to master for solving a cognitive problem and (ii) examining the specific task distributions required for an LLM to exhibit human-like behaviors. We apply this approach to decision-making -- specifically risky and intertemporal choice -- where the key computationally equivalent task is the arithmetic of expected value calculations. We show that an LLM pretrained on an ecologically valid arithmetic dataset, which we call Arithmetic-GPT, predicts human behavior better than many traditional cognitive models. Pretraining LLMs on ecologically valid arithmetic datasets is sufficient to produce a strong correspondence between these models and human decision-making. Our results also suggest that LLMs used as cognitive models should be carefully investigated via ablation studies of the pretraining data.
- Stress, intertemporal choice, and mitigation behavior during the COVID-19 pandemic. Journal of Experimental Psychology: General, 152(9):2695, 2023.
- Scaling up psychology via scientific regret minimization. Proceedings of the National Academy of Sciences, 117(16):8825–8835, 2020.
- Turning large language models into cognitive models. arXiv preprint arXiv:2306.03917, 2023.
- Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6):e2218523120, 2023.
- Cognitive model priors for predicting human decisions. In International Conference on Machine Learning, pages 5133–5141. PMLR, 2019.
- Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
- Hierarchical Bayesian modeling of intertemporal choice. Judgment and Decision Making, 12(1):19–28, 2017.
- Cogbench: a large language model walks into a psychology lab. arXiv preprint arXiv:2402.18225, 2024.
- Language models show human-like content effects on reasoning. arXiv preprint arXiv:2207.07051, 2022.
- Complexity and hyperbolic discounting. CESifo Working Paper, 2023.
- From anomalies to forecasts: Toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychological review, 124(4):369, 2017.
- Michael C Frank. Bridging the data gap between children and large language models. Trends in Cognitive Sciences, 2023.
- Michael C Frank. Large language models as models of human cognition. PsyArXiv, 2023.
- Rationally inattentive intertemporal choice. Nature Communications, 11(1):3365, 2020.
- Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245):273–278, 2015.
- Heuristic decision making. Annual review of psychology, 62:451–482, 2011.
- Bayes in the age of intelligent machines. arXiv preprint arXiv:2311.10206, 2023.
- John J Horton. Large language models as simulated economic agents: What can we learn from homo silicus? Technical report, National Bureau of Economic Research, 2023.
- Unsupervised learning via meta-learning. arXiv preprint arXiv:1810.02334, 2018.
- Investigating data contamination for pre-training language models. arXiv preprint arXiv:2401.06059, 2024.
- Prospect theory: An analysis of decision under risk. In Handbook of the fundamentals of financial decision making: Part I, pages 99–127. World Scientific, 2013.
- David Laibson. Golden eggs and hyperbolic discounting. The Quarterly Journal of Economics, 112(2):443–478, 1997.
- Human-like systematic generalization through a meta-learning neural network. Nature, 623(7985):115–121, 2023.
- Teaching arithmetic to small transformers. arXiv preprint arXiv:2307.03381, 2023.
- Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43:e1, 2020.
- Automated social science: Language models as scientist and subjects. Technical report, National Bureau of Economic Research, 2024.
- Large language models predict human sensory judgments across six modalities. arXiv preprint arXiv:2302.01308, 2023.
- Modeling rapid language learning by distilling bayesian priors into artificial neural networks. arXiv preprint arXiv:2305.14701, 2023.
- Present-biased preferences and credit card borrowing. American Economic Journal: Applied Economics, 2(1):193–210, 2010.
- Lisa Messeri and MJ Crockett. Artificial intelligence and illusions of understanding in scientific research. Nature, 627(8002):49–58, 2024.
- Investigating the limitations of transformers with simple arithmetic tasks. arXiv preprint arXiv:2102.13019, 2021.
- Amy L Odum. Delay discounting: I’m a k, you’re a k. Journal of the Experimental Analysis of Behavior, 96(3):427–439, 2011.
- Using large-scale experiments and machine learning to discover theories of human decision-making. Science, 372(6547):1209–1214, 2021.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
- Probing the psychology of AI models. Proceedings of the National Academy of Sciences, 120(10):e2300963120, 2023.
- Herbert Alexander Simon. Models of bounded rationality: Empirically grounded economic reason, volume 3. MIT press, 1997.
- Decision by sampling. Cognitive Psychology, 53(1):1–26, 2006.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5:297–323, 1992.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- John von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1st edition, 1944.
- Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9):1526–1541, 2023.
- Curvature of the probability weighting function. Management science, 42(12):1676–1690, 1996.
- How well do large language models perform in arithmetic tasks? arXiv preprint arXiv:2304.02015, 2023.
- Incoherent probability judgments in large language models. arXiv preprint arXiv:2401.16646, 2024.
- The Bayesian sampler: Generic Bayesian inference causes incoherence in human probability judgments. Psychological Review, 127(5):719, 2020.
- Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.