Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Testing the limits of natural language models for predicting human language judgments (2204.03592v3)

Published 7 Apr 2022 in cs.CL, cs.AI, and q-bio.NC

Abstract: Neural network LLMs can serve as computational hypotheses about how humans process language. We compared the model-human consistency of diverse LLMs using a novel experimental approach: controversial sentence pairs. For each controversial sentence pair, two LLMs disagree about which sentence is more likely to occur in natural text. Considering nine LLMs (including n-gram, recurrent neural networks, and transformer models), we created hundreds of such controversial sentence pairs by either selecting sentences from a corpus or synthetically optimizing sentence pairs to be highly controversial. Human subjects then provided judgments indicating for each pair which of the two sentences is more likely. Controversial sentence pairs proved highly effective at revealing model failures and identifying models that aligned most closely with human judgments. The most human-consistent model tested was GPT-2, although experiments also revealed significant shortcomings of its alignment with human perception.

Citations (12)

Summary

We haven't generated a summary for this paper yet.