Sensitivity of standardized LLM benchmarks to personality priming
Determine whether standardized benchmarks oriented toward factual recall or static reasoning (such as BIG-Bench) are inherently less sensitive to psychological modulation via MBTI-based personality priming because they lack behavioral ambiguity and subjectivity, and ascertain the extent of this sensitivity relative to affective or cognitively ambiguous tasks.
Sponsor
References
We conjecture that such tasks are inherently less sensitive to psychological modulation, as they lack the behavioral ambiguity and subjectivity that personality tends to influence.
— Psychologically Enhanced AI Agents
(2509.04343 - Besta et al., 4 Sep 2025) in Section 4 (Evaluation and Use Cases), Task Selection