Large Language Models Assume People are More Rational than We Really are (2406.17055v3)

Published 24 Jun 2024 in cs.CL, cs.AI, cs.CY, and cs.LG

Abstract: In order for AI systems to communicate effectively with people, they must understand how we make decisions. However, people's decisions are not always rational, so the implicit internal models of human decision-making in LLMs must account for this. Previous empirical evidence seems to suggest that these implicit models are accurate -- LLMs offer believable proxies of human behavior, acting how we expect humans would in everyday interactions. However, by comparing LLM behavior and predictions to a large dataset of human decisions, we find that this is actually not the case: when both simulating and predicting people's choices, a suite of cutting-edge LLMs (GPT-4o & 4-Turbo, Llama-3-8B & 70B, Claude 3 Opus) assume that people are more rational than we really are. Specifically, these models deviate from human behavior and align more closely with a classic model of rational choice -- expected value theory. Interestingly, people also tend to assume that other people are rational when interpreting their behavior. As a consequence, when we compare the inferences that LLMs and people draw from the decisions of others using another psychological dataset, we find that these inferences are highly correlated. Thus, the implicit decision-making models of LLMs appear to be aligned with the human expectation that other people will act rationally, rather than with how people actually act.

PDF HTML Abstract

Analyzing Rationality Bias in LLMs: A Study on Human Decision-Making

The paper "LLMs Assume People are More Rational than We Really are" explores a critical analysis of how LLMs perceive human decision-making processes, particularly through the lens of rationality. The research interrogates the implicit assumptions encoded in state-of-the-art LLMs such as GPT-4, Llama, and Claude, challenging the notion that these models can accurately simulate or predict human behavior. This investigation engages with foundational theories in psychology and cognitive science to draw conclusions about LLMs' alignment with human rationality expectations versus actual behavior.

The research outlines two primary types of modeling tasks to evaluate LLMs: forward modeling and inverse modeling. In forward modeling, the objective is to predict which option among gambles humans will choose. The paper utilizes a well-documented dataset, choices13k, which consists of over 13,000 decision-making problems where human choices between probabilistic gambles were recorded. The LLMs were prompted to predict these choices under different conditions: zero-shot and chain-of-thought prompting, with the latter providing a structured reasoning process to the models.

The results indicate a discrepancy between human decision-making and LLM predictions. Notably, when LLMs utilize chain-of-thought prompts, they align more closely with the concept of rational choice theory—specifically, expected value theory—than with human behavior. For example, GPT-4 demonstrates a correlation of over 0.93 with the rational model but only 0.64 with actual human decisions. This finding suggests a bias within LLMs, shaped likely by the rational-leaning training data they are exposed to, which often excludes human errors and fallacies.

The second experimental paradigm, inverse modeling, explores how LLMs infer human preferences based on observed decisions. This task aligns with cognitive models where observers infer the intentions behind others' choices. Using a set from Jern et al.'s work involving ranking decision strength, the paper assessed the correlation between LLM inferences and human inferences. Here, the findings are more concordant—LLMs tend to make inferences that echo human expectations of rationality. Intriguingly, stronger models like GPT-4 highly correlate with both human expectations and rational theories (Spearman correlation of up to 0.97 with human inference patterns).

These experiments reveal critical implications for AI alignment research and the application of LLMs in simulating human behavior. The discrepancy between LLM predictions in forward tasks and human decisions highlights the necessity for models trained to align with human behavior, not just idealized rationality. The paper suggests potentially bifurcating alignment strategies: one focusing on meeting human expectations (which assumes rationality in others), and the other on true-to-life human behavior (capturing the irrational nuances).

The insights gained from this research provoke a reconsideration of the efficacy of LLMs in human interaction tasks beyond mere language comprehension. The implicit rationality bias embedded within these models can lead to fundamental misalignments in real-world applications ranging from simulating human participants in social experiments to designing human-AI interaction protocols. This calls for a more nuanced approach in training AI systems, incorporating methodologies from cognitive science that account for the inherently imperfect and sometimes irrational nature of human decision-making.

The broader impacts addressed by this paper urge a reflective discourse on AI deployment in socially and ethically sensitive contexts, promoting a vision of AI systems that can model not just the rational archetype but the varied spectrum of human cognition and behavior. As AI continues to evolve, integrating insights from psychology and cognitive science will be crucial in molding technology that truly understands and complements human capacities.