Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 91 tok/s

Gemini 2.5 Pro 58 tok/s Pro

GPT-5 Medium 29 tok/s

GPT-5 High 29 tok/s Pro

GPT-4o 102 tok/s

GPT OSS 120B 462 tok/s Pro

Kimi K2 181 tok/s Pro

2000 character limit reached

Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications (2505.00049v1)

Published 30 Apr 2025 in cs.CY, cs.CL, cs.HC, and cs.LG

Abstract: As LLMs are increasingly used in human-centered tasks, assessing their psychological traits is crucial for understanding their social impact and ensuring trustworthy AI alignment. While existing reviews have covered some aspects of related research, several important areas have not been systematically discussed, including detailed discussions of diverse psychological tests, LLM-specific psychological datasets, and the applications of LLMs with psychological traits. To address this gap, we systematically review six key dimensions of applying psychological theories to LLMs: (1) assessment tools; (2) LLM-specific datasets; (3) evaluation metrics (consistency and stability); (4) empirical findings; (5) personality simulation methods; and (6) LLM-based behavior simulation. Our analysis highlights both the strengths and limitations of current methods. While some LLMs exhibit reproducible personality patterns under specific prompting schemes, significant variability remains across tasks and settings. Recognizing methodological challenges such as mismatches between psychological tools and LLMs' capabilities, as well as inconsistencies in evaluation practices, this study aims to propose future directions for developing more interpretable, robust, and generalizable psychological assessment frameworks for LLMs.

Collections

Summary

The paper introduces psychological assessment frameworks for LLMs by adapting traditional tools and tailored datasets.
It employs empirical evaluations to highlight LLMs' variability in personality, consistency, and task performance.
The study demonstrates LLMs' potential in simulating human roles and enhancing human-AI interaction in complex scenarios.

Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications

Introduction

The increasing deployment of LLMs in human-centered applications necessitates a comprehensive understanding of their psychological traits. This paper systematically reviews the integration of psychological theories into LLM evaluation, aiming to address the gap in existing literature regarding the psychological assessment tools suitable for LLMs. The review covers six key dimensions: assessment tools, LLM-specific datasets, evaluation metrics, empirical findings, personality simulation methods, and behavior simulation. The analysis highlights the variability in LLMs' psychological profiles across different tasks and settings.

Figure 1: Overview of Psychological Traits and Human Simulations in LLMs.

Psychological Assessment Tools

Traditional psychological assessment tools, such as the Myers-Briggs Type Indicator (MBTI), Big Five Inventory (BFI), and Short Dark Triad (SD-3), have been adapted for evaluating LLMs. These tools assess various personality dimensions, such as extraversion and openness, by employing methods like forced-choice item formats and Likert scales. Despite their use, challenges arise due to potential mismatches between these tools and LLMs' capabilities.

Figure 2: An illustration showing the relationship between Personality, Emotion, Mental Health, and Theory of Mind.

LLM-Specific Datasets

Recent advancements have led to the development of specialized datasets tailored for LLMs, facilitating more nuanced evaluations of their psychological attributes. For instance, the Machine Personality Inventory (MPI) employs personality trait assessment to quantitatively evaluate LLMs, while the TRAIT dataset expands conventional tests by incorporating diverse real-world scenarios.

Figure 3: An example of Imposing Memory Test.

Consistency and Stability

The consistency and stability of LLMs in psychological assessments are crucial for reliable evaluations. Consistency relates to the models' reproducibility of similar outputs in similar conditions, while stability examines changes in psychological characteristics across different model states, such as before and after fine-tuning.

Figure 4: Key Dimensions in LLMs' Psychological Assessment: Consistency vs. Stability.

Psychological Analysis of LLMs

The evaluation of LLMs such as GPT-4, LLaMA, and Mistral across psychological dimensions reveals strengths in specific areas like Theory of Mind (ToM) and emotional intelligence, while underscoring limitations in complex reasoning tasks. This disparity indicates the potential for improvement through enhanced training and dataset refinement.

Personality Simulation and Human Role Simulation

Research into personality simulation techniques, such as editing and prompting, shows promise in steering LLMs towards exhibiting specific personality traits. Moreover, LLMs demonstrate capabilities in simulating human roles through social experiment simulations, game-based interactions, and negotiation tasks, offering insights into human-like decision-making processes.

Figure 5: Three Types of Human Role Simulation by LLMs.

Conclusion

The integration of psychological assessment frameworks in LLMs holds significant potential for advancing AI-human interaction. Future research should focus on refining assessment tools to handle complex social reasoning and emotional intelligence tasks more effectively, ensuring consistent and stable personality representations. By bridging the gap between human psychological constructs and artificial intelligence, LLMs can play a pivotal role in socially sensitive applications.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (13)

Tweets

https://twitter.com/WGOV/status/1918166821821092012

YouTube

Show All Videos