Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games

Published 16 Jun 2026 in cs.AI | (2606.17657v1)

Abstract: People make decisions differently in strategic interactions. Some update beliefs like a Bayesian; others exhibit biases like motivated reasoning. Although creators of LLMs use simulated humans for safety evaluations and training, they often fail to cover this breadth of human behavior. We argue that cognitive science and economics provide a convenient tool for doing so, making use of mathematical models of human decision-making. We propose an approach that we call Equation-to-Behavior Prompting for guiding LLMs to match cognitive models, and evaluate this approach on persuasion games based on legal decision-making. We find that large models can approximate equation-based specifications -- Bayesian updating, affine distortion, motivated updating, and Grether's $α$-$β$ model -- using prompting, but small models fail to do so. However, training small models with reinforcement learning to adhere to mathematical rules, Equation-to-Behavior RL, reduces belief error by 26.5% in out-of-distribution parameterizations. We show that these simulations can help create diverse training environments; training small models to consider different kinds of decision-makers improves average belief change by 2.5%--12% over Bayesian-only training, even when persuading GPT-5-mini. Our work could improve human simulations for training and evaluation in increasingly realistic settings, and could also enable novel research into more complicated mathematical models of human decision-making.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents the Equation-to-Behavior paradigm that embeds cognitive models into language models for realistic human persuasion simulations.
It employs structured prompt engineering and reinforcement learning to align belief-updating with models ranging from Bayesian to motivated reasoning.
Empirical results from legal decision simulations demonstrate that larger models accurately capture human-like biases and improve persuasion effectiveness.

Cognitive Model-Guided Simulation of Human Persuasion Games with LLMs

Introduction

The simulation of human decision-making in strategic contexts is foundational for robust AI agent design and alignment. "Using Cognitive Models to Improve LLM Simulation of Human Persuasion Games" (2606.17657) addresses the underexplored domain of leveraging formal cognitive and behavioral economic models to direct and evaluate LLM behavior in multi-agent persuasion settings. Standard approaches, relying chiefly on persona-based prompting or preference learning, lack mechanisms to specify or enforce empirically grounded cognitive biases. The Equation-to-Behavior paradigm introduced here offers a methodology for rigorously instantiating a broad spectrum of cognitive models—including Bayesian, affine distorted, motivated updating, and Grether’s $\alpha$ – $\beta$ models—directly into LLM agents via prompt design and reinforcement learning (RL) objectives.

The experimental setting is structured around strategic Sender-Receiver (prosecutor-judge) interactions modeled from legal decision-making, drawing on richly annotated, crowd-validated trial records from the Old Bailey proceedings. This controlled, high-fidelity environment allows for precise behavioral benchmarking under diverse cognitive specifications, setting a precedent for future studies integrating cognitively precise agents into LLM-based simulations and training pipelines.

Figure 1: Overview of the Equation-to-Behavior approach. Cognitive models specify belief updating for LLM agents, facilitating controllable, realistic simulations in strategic interactions.

The Equation-to-Behavior Paradigm

The central innovation is systematically embedding computational cognitive models as behavioral constraints on LLM simulated agents, rather than relying solely on natural-language persona (“be more skeptical,” “act as a stubborn judge”) which lacks formal cognitive guarantees. The Equation-to-Behavior pipeline comprises (1) prompt engineering: translating belief-updating equations into highly structured prompts, and (2) reinforcement learning: directly optimizing for adherence to cognitive-model-determined belief trajectories.

Cognitive Model Toolkit

Four primary model families are considered, capturing the diversity of empirically-validated human updates:

Bayesian Updating: Normative probabilistic inference, providing baseline rational agent simulations.
Affine Distortion: Linear interpolation between prior and evidence, controlling conservatism via a mixing coefficient.
Motivated Updating: Nonlinear distortion toward reference beliefs, modeling motivated reasoning and selective updating.
Grether’s $\alpha$ – $\beta$ Model: Generalizes prior/likelihood weighting, explicitly capturing base-rate neglect and over-/underweighting of evidence.

Each model permits explicit parameterization and thus systematic manipulation within agent populations.

Figure 3: Affine distortion surface; the distorted posterior as a function of Bayesian posterior $\nu$ and prior $\mu_0$ (top) and cross-sections for selected priors (bottom).

Figure 5: Motivated updating introduces nonlinearity and asymmetry, amplifying evidence in preferred directions.

Figure 7: Grether’s $\alpha$ – $\beta$ model surface; parameter variation induces under- and over-inference effects.

Empirical Evaluation

Dataset and Task Construction

A large-scale trial dataset is constructed from annotated Old Bailey proceedings, with each legal case decomposed into multiple evidential units and dependency structures. Automated LLM annotation is validated by crowdworkers, demonstrating strong inter-annotator agreement on both evidence strength and structural attributes.

Figure 2: Annotation interface for crowdworkers, used for validating LLM-generated evidential labels and causal links.

Zero-Shot Prompting: Model Capacity and Limitations

Experiments reveal that large LLMs (GPT-5, Claude-Sonnet-4, DeepSeek-V3.1) can reliably implement parametric updating procedures under Equation-to-Behavior prompts, accurately reproducing both Bayesian and a range of non-Bayesian belief trajectories as anticipated by the theoretical models.

Figure 4: Conviction rates (y-axis) by round under Bayesian or default updating, in prosecution vs. defense-favoring evidence orders. LLMs display order effects and monotonicity violations indicative of human-like biases.

Figure 6: Elicited posterior beliefs from LLMs (realistic) compared to Bayesian predictions (approximate) under independence; large models better approximate Bayesian structure.

Order-of-evidence manipulations clearly induce primacy and recency biases, violating Bayesian order-invariance and mirroring motivational influences observed in human judge populations.

On non-Bayesian models, large LLMs capture convexity, asymmetry, and under/over-inference as parameterized. Smaller models (Llama-3.1-8B, etc.) generally fail to respond reliably to equation-based prompts, defaulting toward baseline rationality.

Reinforcement Learning: Training Small Models to Simulate Human-like Belief Updating

To address prompt-following deficiencies in small-scale models, the authors introduce Equation-to-Behavior RL. Here, LLMs are trained—using the veRL+GRPO RLHF framework—against diverse cognitive reward targets, with explicit $L_1$ -based rewards measuring distance to ground-truth belief trajectories for sampled cognitive model parameterizations.

Figure 8: RL reward curves for Receiver training. All three models (Llama-3.1-8B, Qwen-2.5-7B, Mistral-7B) show significant post-training improvement, approaching ground-truth beliefs.

Quantitatively, trained models generalize robustly to both out-of-distribution parameter settings and to cognitive models not encountered during training, with mean belief error reductions of approximately 26.5% and consistent improvements across Grether, affine, motivated, divisible (non-Bayesian), and correlation-neglect regimes.

Mechanistic and Behavioral Analysis

Finer-grained behavioral analyses underscore that:

Model scale is critical: only frontier models consistently realize complex cognitive updates from prompt alone.
Logistic mapping from posterior beliefs to action (e.g., convict/acquit) in LLMs diverges across model size: larger models present steeper, more deterministic mappings, while small models conflate belief and action stages, increasing stochasticity and noise.
Figure 10: Logistic regression fits from elicited beliefs to action: higher slopes and $R^2$ in larger models indicate more coherent action selection policy.

Downstream and Generalization Impacts

Training Senders against Realistically Heterogeneous Receivers

Senders trained exclusively against Bayesian Receivers exhibit suboptimal strategies in mixed-parameter environments. RL-based sender training against diverse, cognitively parametrized receivers yields improvements of 2.5%–12% in average belief change (persuasion effectiveness), showing direct utility for curricula including realistic cognitive heterogeneity.

Realistic Evaluation: Equation-to-Behavior versus Persona Prompting

Equation-to-Behavior Prompting achieves higher verdict match rates to real judges than persona-based prompting in simulated Old Bailey sessions, with gains of up to 9.1% (Claude-Sonnet-4), underscoring the value of explicit parametric control over behavioral simulation.

Figure 12: Distribution of Grether $\beta$ 0– $\beta$ 1 parameters across 446 real court sessions, justifying the need for empirically diverse cognitive simulations.

Implications and Future Directions

The Equation-to-Behavior framework provides a generalizable paradigm for LLM agent design and evaluation in strategic environments, facilitating both high-fidelity simulation of human cognitive diversity and modular transfer to reinforcement learning pipelines. The findings underscore several key implications:

Practical: AI systems deployed as social agents, policy advisors, or legal simulators should incorporate robust modeling of heterogeneous, empirically observed human inference patterns to prevent systematic failures and misalignment.
Theoretical: Results confirm the non-existence of universally optimal persuasion strategies due to the impossibility of globally ranking updating rules; thus, robust agent design mandates explicit modeling of receiver cognitive diversity [de_clippel2022non-bayesian].
Methodological: Fine-tuning smaller LLMs in environments constructed from Equation-to-Behavior models enables efficient, scalable training for heterogeneity without large-scale human data collection.

Future research trajectories include extending these frameworks to broader strategic settings (negotiation, discourse, voting), developing more refined or higher-dimensional cognitive model classes, and integrating cognitive control into multi-modal and embodied AI agents.

Conclusion

This work establishes both a theoretical and practical foundation for embedding cognitive science and behavioral economic models within LLM agents, enabling calibrated, controllable, and realistic simulation of human strategic decision-making across persuasive social domains. The results indicate that even small-scale open LLMs, when appropriately trained, can robustly instantiate empirically validated belief updating behaviors, facilitating their use in scalable training, evaluation, and alignment workflows for interactive AI agents.

Markdown Report Issue