Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

60 tokens/sec

Gemini 2.5 Pro Pro

44 tokens/sec

o3 Pro

8 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

180

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice (2405.19313v1)

Published 29 May 2024 in cs.AI, cs.CL, econ.GN, and q-fin.EC

Abstract: The observed similarities in the behavior of humans and LLMs have prompted researchers to consider the potential of using LLMs as models of human cognition. However, several significant challenges must be addressed before LLMs can be legitimately regarded as cognitive models. For instance, LLMs are trained on far more data than humans typically encounter, and may have been directly trained on human data in specific cognitive tasks or aligned with human preferences. Consequently, the origins of these behavioral similarities are not well understood. In this paper, we propose a novel way to enhance the utility of LLMs as cognitive models. This approach involves (i) leveraging computationally equivalent tasks that both an LLM and a rational agent need to master for solving a cognitive problem and (ii) examining the specific task distributions required for an LLM to exhibit human-like behaviors. We apply this approach to decision-making -- specifically risky and intertemporal choice -- where the key computationally equivalent task is the arithmetic of expected value calculations. We show that an LLM pretrained on an ecologically valid arithmetic dataset, which we call Arithmetic-GPT, predicts human behavior better than many traditional cognitive models. Pretraining LLMs on ecologically valid arithmetic datasets is sufficient to produce a strong correspondence between these models and human decision-making. Our results also suggest that LLMs used as cognitive models should be carefully investigated via ablation studies of the pretraining data.

PDF HTML Abstract

Enhancing LLMs as Cognitive Models through Ecologically Valid Arithmetic Pretraining

The research paper under discussion investigates the potential of LLMs to serve as effective cognitive models of human decision-making. The central motivation stems from observed similarities in the behavior of LLMs and humans, particularly in tasks involving decision-making under risk and intertemporal choices. This paper puts forth a novel approach that involves pretraining LLMs on synthetic datasets structured around ecologically valid arithmetic tasks, thereby enabling a stronger alignment between LLMs and human cognitive processes.

Methodology and Model Architecture

The paper introduces a specific LLM variant named Arithmetic-GPT. This model is a small Generative Pretrained Transformer (GPT) with approximately 10 million parameters, tailored for arithmetic operations necessary for calculating expected values (EV) in risky choices and present values (PV) in intertemporal choices. The model architecture employs standard components such as absolute positional embeddings, causal masking, and domain-specific tokenization. A key aspect of the training involved creating synthetic datasets that reflect ecological distributions of probabilities and values observed in real-world scenarios.

Synthetic Data and Pretraining

The synthetic datasets generated for this work are key to its methodological innovation. These datasets include 1 million arithmetic equations with probabilities and values tailored to align with natural frequencies, such as Beta-distributed probabilities and power-law-distributed values. Various versions of the dataset were examined, including ablated versions where the answers were removed and signs randomized. The model was pretrained using these datasets, and embeddings were extracted to assess their predictive power regarding human behavior in decision-making tasks.

Evaluation on Human Choice Data

Human choice data from four well-documented experimental datasets involving risky and intertemporal choices were used to evaluate the model's effectiveness. The paper compared the performance of Arithmetic-GPT with several benchmarks, including off-the-shelf LLMs such as LLaMA-3-70B, classical behavioral models like Cumulative Prospect Theory (CPT) and the hyperbolic discounting model, and direct training on human data using Multilayer Perceptrons (MLPs).

Results and Findings

The experimental results demonstrate that LLMs pretrained on ecologically valid arithmetic datasets significantly outperform classical cognitive models in predicting human choice behaviors. Specifically, Arithmetic-GPT models, especially those trained on ecological synthetic data, display a strong alignment with human decision-making, with $R^2$ values reaching up to 70.8% and 67.8% for risky and intertemporal choices, respectively. Embeddings from well-pretrained Arithmetic-GPT also outperform embeddings derived from larger, general-purpose LLMs like LLaMA-3-70B when using arithmetic input formats.

The paper also shows that classical behavioral models, although interpretable and grounded in experimental psychology, fall short in explaining the observed human data as effectively as the ecologically pretrained LLMs. Interestingly, while MLPs directly trained on human datasets achieve higher $R^2$ values, they do so without enforcing the cognitive constraints that Arithmetic-GPT respects.

Implicit Cognitive Functions

Analysis of the embeddings revealed that Arithmetic-GPT implicitly learned functions resembling those found in behavioral economic models. For example, the embeddings replicate typical cognitive biases such as probability weighting, loss aversion, and hyperbolic discounting, which are central to human decision-making theories. These results indicate that pretrained LLMs can capture complex human cognitive processes when trained on tasks that align closely with the computations humans perform.

Implications and Future Directions

This research underscores the potential of using synthetically generated, ecologically valid data to enhance LLMs as cognitive models. By aligning the training data more closely with the kinds of calculations humans perform, LLMs can be made to exhibit more human-like decision patterns. This approach also suggests that deviations from rationality in human decision-making could primarily be due to computational errors, as mirrored by the LLM's performance.

Practically, this work has implications for developing AI systems that better predict and understand human behavior, which could have broad applications in fields such as behavioral economics, psychology, and human-computer interaction. Theoretically, it bridges gaps between computational neuroscience, cognitive science, and machine learning, offering a pathway for interdisciplinary research.

Future studies might investigate extending this approach to other cognitive domains or explore different types and distributions of synthetic data. Further exploration into the internal representations of pretrained LLMs could yield deeper insights into the specific mechanisms by which these models replicate human-like cognitive processes.

In conclusion, this paper presents a compelling methodology for training LLMs to better model human cognition by focusing on ecologically valid arithmetic computations. Through systematic pretraining and robust evaluation, the paper contributes significantly to the understanding and development of LLMs as cognitive models, opening new avenues for both theoretical research and practical applications in AI and cognitive science.

PDF Markdown Bookmark Chat (Pro)

References (48)

Authors (3)

Jian-Qiao Zhu (12 papers)
Haijiang Yan (5 papers)
Thomas L. Griffiths (150 papers)

Citations (1)

View on Semantic Scholar

Tweets

https://twitter.com/JQ_Zhu/status/1796236408413171820

https://twitter.com/fly51fly/status/1797038326697922646

https://twitter.com/John_W_Maki/status/1797302690457788623

https://twitter.com/arxivsanitybot/status/1796363143200813265

https://twitter.com/gm8xx8/status/1795996857446883681

https://twitter.com/cereal/status/1796038209408745833