Enhancing LLMs as Cognitive Models through Ecologically Valid Arithmetic Pretraining
The research paper under discussion investigates the potential of LLMs to serve as effective cognitive models of human decision-making. The central motivation stems from observed similarities in the behavior of LLMs and humans, particularly in tasks involving decision-making under risk and intertemporal choices. This paper puts forth a novel approach that involves pretraining LLMs on synthetic datasets structured around ecologically valid arithmetic tasks, thereby enabling a stronger alignment between LLMs and human cognitive processes.
Methodology and Model Architecture
The paper introduces a specific LLM variant named Arithmetic-GPT. This model is a small Generative Pretrained Transformer (GPT) with approximately 10 million parameters, tailored for arithmetic operations necessary for calculating expected values (EV) in risky choices and present values (PV) in intertemporal choices. The model architecture employs standard components such as absolute positional embeddings, causal masking, and domain-specific tokenization. A key aspect of the training involved creating synthetic datasets that reflect ecological distributions of probabilities and values observed in real-world scenarios.
Synthetic Data and Pretraining
The synthetic datasets generated for this work are key to its methodological innovation. These datasets include 1 million arithmetic equations with probabilities and values tailored to align with natural frequencies, such as Beta-distributed probabilities and power-law-distributed values. Various versions of the dataset were examined, including ablated versions where the answers were removed and signs randomized. The model was pretrained using these datasets, and embeddings were extracted to assess their predictive power regarding human behavior in decision-making tasks.
Evaluation on Human Choice Data
Human choice data from four well-documented experimental datasets involving risky and intertemporal choices were used to evaluate the model's effectiveness. The paper compared the performance of Arithmetic-GPT with several benchmarks, including off-the-shelf LLMs such as LLaMA-3-70B, classical behavioral models like Cumulative Prospect Theory (CPT) and the hyperbolic discounting model, and direct training on human data using Multilayer Perceptrons (MLPs).
Results and Findings
The experimental results demonstrate that LLMs pretrained on ecologically valid arithmetic datasets significantly outperform classical cognitive models in predicting human choice behaviors. Specifically, Arithmetic-GPT models, especially those trained on ecological synthetic data, display a strong alignment with human decision-making, with values reaching up to 70.8% and 67.8% for risky and intertemporal choices, respectively. Embeddings from well-pretrained Arithmetic-GPT also outperform embeddings derived from larger, general-purpose LLMs like LLaMA-3-70B when using arithmetic input formats.
The paper also shows that classical behavioral models, although interpretable and grounded in experimental psychology, fall short in explaining the observed human data as effectively as the ecologically pretrained LLMs. Interestingly, while MLPs directly trained on human datasets achieve higher values, they do so without enforcing the cognitive constraints that Arithmetic-GPT respects.
Implicit Cognitive Functions
Analysis of the embeddings revealed that Arithmetic-GPT implicitly learned functions resembling those found in behavioral economic models. For example, the embeddings replicate typical cognitive biases such as probability weighting, loss aversion, and hyperbolic discounting, which are central to human decision-making theories. These results indicate that pretrained LLMs can capture complex human cognitive processes when trained on tasks that align closely with the computations humans perform.
Implications and Future Directions
This research underscores the potential of using synthetically generated, ecologically valid data to enhance LLMs as cognitive models. By aligning the training data more closely with the kinds of calculations humans perform, LLMs can be made to exhibit more human-like decision patterns. This approach also suggests that deviations from rationality in human decision-making could primarily be due to computational errors, as mirrored by the LLM's performance.
Practically, this work has implications for developing AI systems that better predict and understand human behavior, which could have broad applications in fields such as behavioral economics, psychology, and human-computer interaction. Theoretically, it bridges gaps between computational neuroscience, cognitive science, and machine learning, offering a pathway for interdisciplinary research.
Future studies might investigate extending this approach to other cognitive domains or explore different types and distributions of synthetic data. Further exploration into the internal representations of pretrained LLMs could yield deeper insights into the specific mechanisms by which these models replicate human-like cognitive processes.
In conclusion, this paper presents a compelling methodology for training LLMs to better model human cognition by focusing on ecologically valid arithmetic computations. Through systematic pretraining and robust evaluation, the paper contributes significantly to the understanding and development of LLMs as cognitive models, opening new avenues for both theoretical research and practical applications in AI and cognitive science.