Teaching Language Models to Think Like Real Humans in Persuasion Games
This presentation explores a breakthrough in AI simulation: embedding formal cognitive models directly into language model agents to recreate human decision-making in strategic persuasion scenarios. Using legal trial data from Old Bailey proceedings, researchers developed the Equation-to-Behavior paradigm that allows AI to simulate diverse human reasoning patterns—from rational Bayesian updating to motivated reasoning and evidence distortion—with unprecedented control and fidelity. The work demonstrates how both prompting and reinforcement learning can transform language models into cognitively realistic agents, opening new pathways for alignment, training, and evaluation of AI systems that interact with humans.Script
When we ask language models to simulate human judges or negotiators, we usually rely on vague instructions like be skeptical or act conservative. But humans don't just follow personality descriptions. They update beliefs using precise, often irrational, cognitive mechanisms that behavioral scientists have spent decades mapping. This research asks: what if we could program those exact human biases directly into AI agents?
The Equation-to-Behavior paradigm translates computational cognitive models into structured prompts and reinforcement learning objectives. Instead of asking a model to be more rational, researchers specify exact belief-updating equations from behavioral economics: Bayesian inference, affine distortion that blends prior beliefs with evidence, motivated updating that bends toward preferred conclusions, or Grether's alpha-beta model that captures base-rate neglect.
The researchers tested these models in simulated legal trials using 18th-century Old Bailey court records. Large language models like GPT-5 and Claude-Sonnet-4 faithfully reproduce the cognitive biases when prompted with equations. They show primacy and recency effects, where evidence order changes verdicts even though Bayesian logic says it shouldn't. Smaller models fail entirely, reverting to approximately rational behavior regardless of the prompt.
To fix the small model problem, the authors applied reinforcement learning with a cognitive fidelity reward. Models like Llama-3.1-8B and Qwen-2.5-7B were trained to match ground-truth belief trajectories from sampled cognitive models. After training, mean belief error dropped by 26.5 percent. The models generalized to new parameter settings and even to cognitive models they had never seen during training, learning the underlying structure of human-like updating.
Training persuaders against only rational Bayesian receivers produces brittle strategies. When the same senders face a realistic mix of cognitive types, their performance drops. Senders trained against cognitively diverse receivers using Equation-to-Behavior models achieved 2.5 to 12 percent higher persuasion effectiveness, proving that realistic heterogeneity matters for robust agent design.
This work establishes a foundation for embedding cognitive science into AI agents that interact strategically with humans. Because no universally optimal persuasion strategy exists across diverse human reasoning styles, robust AI alignment requires explicit modeling of cognitive heterogeneity. If you're building agents for negotiation, policy advice, or any domain where humans bring their biases, visit EmergentMind.com to explore how Equation-to-Behavior methods can ground your models in the messy reality of human thought.