Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments (2311.09718v2)

Published 16 Nov 2023 in cs.CL and cs.AI

Abstract: The versatility of LLMs on natural language understanding tasks has made them popular for research in social sciences. To properly understand the properties and innate personas of LLMs, researchers have performed studies that involve using prompts in the form of questions that ask LLMs about particular opinions. In this study, we take a cautionary step back and examine whether the current format of prompting LLMs elicits responses in a consistent and robust manner. We first construct a dataset that contains 693 questions encompassing 39 different instruments of persona measurement on 115 persona axes. Additionally, we design a set of prompts containing minor variations and examine LLMs' capabilities to generate answers, as well as prompt variations to examine their consistency with respect to content-level variations such as switching the order of response options or negating the statement. Our experiments on 17 different LLMs reveal that even simple perturbations significantly downgrade a model's question-answering ability, and that most LLMs have low negation consistency. Our results suggest that the currently widespread practice of prompting is insufficient to accurately and reliably capture model perceptions, and we therefore discuss potential alternatives to improve these issues.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Bangzhao Shu (1 paper)
  2. Lechen Zhang (9 papers)
  3. Minje Choi (13 papers)
  4. Lavinia Dunagan (5 papers)
  5. Dallas Card (20 papers)
  6. David Jurgens (69 papers)
  7. Lajanugen Logeswaran (30 papers)
  8. Moontae Lee (54 papers)
Citations (20)