Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 81 tok/s

Gemini 2.5 Pro 45 tok/s Pro

GPT-5 Medium 14 tok/s Pro

GPT-5 High 16 tok/s Pro

GPT-4o 86 tok/s Pro

Kimi K2 145 tok/s Pro

GPT OSS 120B 446 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings (2506.14997v1)

Published 17 Jun 2025 in cs.CY, cs.CL, and cs.LG

Abstract: As LLMs increasingly appear in social science research (e.g., economics and marketing), it becomes crucial to assess how well these models replicate human behavior. In this work, using hypothesis testing, we present a quantitative framework to assess the misalignment between LLM-simulated and actual human behaviors in multiple-choice survey settings. This framework allows us to determine in a principled way whether a specific LLM can effectively simulate human opinions, decision-making, and general behaviors represented through multiple-choice options. We applied this framework to a popular LLM for simulating people's opinions in various public surveys and found that this model is ill-suited for simulating the tested sub-populations (e.g., across different races, ages, and incomes) for contentious questions. This raises questions about the alignment of this LLM with the tested populations, highlighting the need for new practices in using LLMs for social science studies beyond naive simulations of human subjects.