Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice (2405.19313v1)

Published 29 May 2024 in cs.AI, cs.CL, econ.GN, and q-fin.EC
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice

Abstract: The observed similarities in the behavior of humans and LLMs have prompted researchers to consider the potential of using LLMs as models of human cognition. However, several significant challenges must be addressed before LLMs can be legitimately regarded as cognitive models. For instance, LLMs are trained on far more data than humans typically encounter, and may have been directly trained on human data in specific cognitive tasks or aligned with human preferences. Consequently, the origins of these behavioral similarities are not well understood. In this paper, we propose a novel way to enhance the utility of LLMs as cognitive models. This approach involves (i) leveraging computationally equivalent tasks that both an LLM and a rational agent need to master for solving a cognitive problem and (ii) examining the specific task distributions required for an LLM to exhibit human-like behaviors. We apply this approach to decision-making -- specifically risky and intertemporal choice -- where the key computationally equivalent task is the arithmetic of expected value calculations. We show that an LLM pretrained on an ecologically valid arithmetic dataset, which we call Arithmetic-GPT, predicts human behavior better than many traditional cognitive models. Pretraining LLMs on ecologically valid arithmetic datasets is sufficient to produce a strong correspondence between these models and human decision-making. Our results also suggest that LLMs used as cognitive models should be carefully investigated via ablation studies of the pretraining data.

Enhancing LLMs as Cognitive Models through Ecologically Valid Arithmetic Pretraining

The research paper under discussion investigates the potential of LLMs to serve as effective cognitive models of human decision-making. The central motivation stems from observed similarities in the behavior of LLMs and humans, particularly in tasks involving decision-making under risk and intertemporal choices. This paper puts forth a novel approach that involves pretraining LLMs on synthetic datasets structured around ecologically valid arithmetic tasks, thereby enabling a stronger alignment between LLMs and human cognitive processes.

Methodology and Model Architecture

The paper introduces a specific LLM variant named Arithmetic-GPT. This model is a small Generative Pretrained Transformer (GPT) with approximately 10 million parameters, tailored for arithmetic operations necessary for calculating expected values (EV) in risky choices and present values (PV) in intertemporal choices. The model architecture employs standard components such as absolute positional embeddings, causal masking, and domain-specific tokenization. A key aspect of the training involved creating synthetic datasets that reflect ecological distributions of probabilities and values observed in real-world scenarios.

Synthetic Data and Pretraining

The synthetic datasets generated for this work are key to its methodological innovation. These datasets include 1 million arithmetic equations with probabilities and values tailored to align with natural frequencies, such as Beta-distributed probabilities and power-law-distributed values. Various versions of the dataset were examined, including ablated versions where the answers were removed and signs randomized. The model was pretrained using these datasets, and embeddings were extracted to assess their predictive power regarding human behavior in decision-making tasks.

Evaluation on Human Choice Data

Human choice data from four well-documented experimental datasets involving risky and intertemporal choices were used to evaluate the model's effectiveness. The paper compared the performance of Arithmetic-GPT with several benchmarks, including off-the-shelf LLMs such as LLaMA-3-70B, classical behavioral models like Cumulative Prospect Theory (CPT) and the hyperbolic discounting model, and direct training on human data using Multilayer Perceptrons (MLPs).

Results and Findings

The experimental results demonstrate that LLMs pretrained on ecologically valid arithmetic datasets significantly outperform classical cognitive models in predicting human choice behaviors. Specifically, Arithmetic-GPT models, especially those trained on ecological synthetic data, display a strong alignment with human decision-making, with R2R^2 values reaching up to 70.8% and 67.8% for risky and intertemporal choices, respectively. Embeddings from well-pretrained Arithmetic-GPT also outperform embeddings derived from larger, general-purpose LLMs like LLaMA-3-70B when using arithmetic input formats.

The paper also shows that classical behavioral models, although interpretable and grounded in experimental psychology, fall short in explaining the observed human data as effectively as the ecologically pretrained LLMs. Interestingly, while MLPs directly trained on human datasets achieve higher R2R^2 values, they do so without enforcing the cognitive constraints that Arithmetic-GPT respects.

Implicit Cognitive Functions

Analysis of the embeddings revealed that Arithmetic-GPT implicitly learned functions resembling those found in behavioral economic models. For example, the embeddings replicate typical cognitive biases such as probability weighting, loss aversion, and hyperbolic discounting, which are central to human decision-making theories. These results indicate that pretrained LLMs can capture complex human cognitive processes when trained on tasks that align closely with the computations humans perform.

Implications and Future Directions

This research underscores the potential of using synthetically generated, ecologically valid data to enhance LLMs as cognitive models. By aligning the training data more closely with the kinds of calculations humans perform, LLMs can be made to exhibit more human-like decision patterns. This approach also suggests that deviations from rationality in human decision-making could primarily be due to computational errors, as mirrored by the LLM's performance.

Practically, this work has implications for developing AI systems that better predict and understand human behavior, which could have broad applications in fields such as behavioral economics, psychology, and human-computer interaction. Theoretically, it bridges gaps between computational neuroscience, cognitive science, and machine learning, offering a pathway for interdisciplinary research.

Future studies might investigate extending this approach to other cognitive domains or explore different types and distributions of synthetic data. Further exploration into the internal representations of pretrained LLMs could yield deeper insights into the specific mechanisms by which these models replicate human-like cognitive processes.

In conclusion, this paper presents a compelling methodology for training LLMs to better model human cognition by focusing on ecologically valid arithmetic computations. Through systematic pretraining and robust evaluation, the paper contributes significantly to the understanding and development of LLMs as cognitive models, opening new avenues for both theoretical research and practical applications in AI and cognitive science.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Stress, intertemporal choice, and mitigation behavior during the COVID-19 pandemic. Journal of Experimental Psychology: General, 152(9):2695, 2023.
  2. Scaling up psychology via scientific regret minimization. Proceedings of the National Academy of Sciences, 117(16):8825–8835, 2020.
  3. Turning large language models into cognitive models. arXiv preprint arXiv:2306.03917, 2023.
  4. Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6):e2218523120, 2023.
  5. Cognitive model priors for predicting human decisions. In International Conference on Machine Learning, pages 5133–5141. PMLR, 2019.
  6. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
  7. Hierarchical Bayesian modeling of intertemporal choice. Judgment and Decision Making, 12(1):19–28, 2017.
  8. Cogbench: a large language model walks into a psychology lab. arXiv preprint arXiv:2402.18225, 2024.
  9. Language models show human-like content effects on reasoning. arXiv preprint arXiv:2207.07051, 2022.
  10. Complexity and hyperbolic discounting. CESifo Working Paper, 2023.
  11. From anomalies to forecasts: Toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychological review, 124(4):369, 2017.
  12. Michael C Frank. Bridging the data gap between children and large language models. Trends in Cognitive Sciences, 2023.
  13. Michael C Frank. Large language models as models of human cognition. PsyArXiv, 2023.
  14. Rationally inattentive intertemporal choice. Nature Communications, 11(1):3365, 2020.
  15. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245):273–278, 2015.
  16. Heuristic decision making. Annual review of psychology, 62:451–482, 2011.
  17. Bayes in the age of intelligent machines. arXiv preprint arXiv:2311.10206, 2023.
  18. John J Horton. Large language models as simulated economic agents: What can we learn from homo silicus? Technical report, National Bureau of Economic Research, 2023.
  19. Unsupervised learning via meta-learning. arXiv preprint arXiv:1810.02334, 2018.
  20. Investigating data contamination for pre-training language models. arXiv preprint arXiv:2401.06059, 2024.
  21. Prospect theory: An analysis of decision under risk. In Handbook of the fundamentals of financial decision making: Part I, pages 99–127. World Scientific, 2013.
  22. David Laibson. Golden eggs and hyperbolic discounting. The Quarterly Journal of Economics, 112(2):443–478, 1997.
  23. Human-like systematic generalization through a meta-learning neural network. Nature, 623(7985):115–121, 2023.
  24. Teaching arithmetic to small transformers. arXiv preprint arXiv:2307.03381, 2023.
  25. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43:e1, 2020.
  26. Automated social science: Language models as scientist and subjects. Technical report, National Bureau of Economic Research, 2024.
  27. Large language models predict human sensory judgments across six modalities. arXiv preprint arXiv:2302.01308, 2023.
  28. Modeling rapid language learning by distilling bayesian priors into artificial neural networks. arXiv preprint arXiv:2305.14701, 2023.
  29. Present-biased preferences and credit card borrowing. American Economic Journal: Applied Economics, 2(1):193–210, 2010.
  30. Lisa Messeri and MJ Crockett. Artificial intelligence and illusions of understanding in scientific research. Nature, 627(8002):49–58, 2024.
  31. Investigating the limitations of transformers with simple arithmetic tasks. arXiv preprint arXiv:2102.13019, 2021.
  32. Amy L Odum. Delay discounting: I’m a k, you’re a k. Journal of the Experimental Analysis of Behavior, 96(3):427–439, 2011.
  33. Using large-scale experiments and machine learning to discover theories of human decision-making. Science, 372(6547):1209–1214, 2021.
  34. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  35. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
  36. Probing the psychology of AI models. Proceedings of the National Academy of Sciences, 120(10):e2300963120, 2023.
  37. Herbert Alexander Simon. Models of bounded rationality: Empirically grounded economic reason, volume 3. MIT press, 1997.
  38. Decision by sampling. Cognitive Psychology, 53(1):1–26, 2006.
  39. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  40. Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5:297–323, 1992.
  41. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  42. John von Neumann and Oskar Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 1st edition, 1944.
  43. Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9):1526–1541, 2023.
  44. Curvature of the probability weighting function. Management science, 42(12):1676–1690, 1996.
  45. How well do large language models perform in arithmetic tasks? arXiv preprint arXiv:2304.02015, 2023.
  46. Incoherent probability judgments in large language models. arXiv preprint arXiv:2401.16646, 2024.
  47. The Bayesian sampler: Generic Bayesian inference causes incoherence in human probability judgments. Psychological Review, 127(5):719, 2020.
  48. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jian-Qiao Zhu (12 papers)
  2. Haijiang Yan (5 papers)
  3. Thomas L. Griffiths (150 papers)
Citations (1)