Six Fallacies in Substituting Large Language Models for Human Participants (2402.04470v5)
Abstract: Can AI systems like LLMs replace human participants in behavioral and psychological research? Here I critically evaluate the "replacement" perspective and identify six interpretive fallacies that undermine its validity. These fallacies are: (1) equating token prediction with human intelligence, (2) treating LLMs as the average human, (3) interpreting alignment as explanation, (4) anthropomorphizing AI systems, (5) essentializing identities, and (6) substituting model data for human evidence. Each fallacy represents a potential misunderstanding about what LLMs are and what they can tell us about human cognition. The analysis distinguishes levels of similarity between LLMs and humans, particularly functional equivalence (outputs) versus mechanistic equivalence (processes), while highlighting both technical limitations (addressable through engineering) and conceptual limitations (arising from fundamental differences between statistical and biological intelligence). For each fallacy, specific safeguards are provided to guide responsible research practices. Ultimately, the analysis supports conceptualizing LLMs as pragmatic simulation tools--useful for role-play, rapid hypothesis testing, and computational modeling (provided their outputs are validated against human data)--rather than as replacements for human participants. This framework enables researchers to leverage LLMs productively while respecting the fundamental differences between machine intelligence and human thought.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.