Do LLMs have Consistent Values? (2407.12878v3)
Abstract: LLMs (LLM) technology is constantly improving towards human-like dialogue. Values are a basic driving force underlying human behavior, but little research has been done to study the values exhibited in text generated by LLMs. Here we study this question by turning to the rich literature on value structure in psychology. We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values. We show that the results of this analysis depend on how the LLM is prompted, and that under a particular prompting strategy (referred to as "Value Anchoring") the agreement with human data is quite compelling. Our results serve both to improve our understanding of values in LLMs, as well as introduce novel methods for assessing consistency in LLM responses.
Summary
- The paper reveals that basic prompting causes LLMs to produce low internal consistency and weak alignment with established human value hierarchies.
- The use of persona-based prompts, especially the Value Anchor method, markedly improves coherence, yielding high Spearman correlations and MDS results that mirror human value structures.
- The study also shows a trade-off between variability and consistency, as higher temperature settings lead to reduced Cronbach’s Alpha in LLM responses.
This paper, "Do LLMs have Consistent Values?" (2407.12878), investigates whether LLMs exhibit a consistent system of values similar to those observed in humans. The authors leverage the well-established Theory of Basic Human Values by Shalom Schwartz, which defines 19 core values arranged in a circular structure reflecting motivational compatibilities and conflicts.
The central question is whether an LLM, within a single conversation session, can generate responses consistent with a coherent psychological profile, specifically concerning value priorities and interrelations, and whether different sessions can simulate a population of diverse human value profiles.
To explore this, the researchers used the 57-item Portrait Value Questionnaire—Revised (PVQ-RR), a standard psychological tool, to probe the values of two prominent LLMs: GPT-4 and Gemini Pro. They tested five different prompting strategies, each designed to elicit a distinct response style or "persona":
- Basic prompt: Standard questionnaire instructions.
- Value Anchor prompt: Instructs the LLM to respond as a person emphasizing a specific value (using descriptions from another value scale).
- Demographic prompt: Assigns the LLM a random age, gender, occupation, and hobby.
- Generated Persona prompt: Instructs the LLM to first create a 2-3 sentence persona description and then answer as that persona.
- Names prompt: Assigns the LLM a title (Mr., Ms., Mx.) and a surname from a list representing different ethnic groups.
For each model and prompt type, 300 response sets (simulating 300 individuals) were generated, half using the male version and half the female version of the questionnaire. This was done at two temperature settings: 0.0 (minimal variability) and 0.7 (increased variability), resulting in 20 datasets. These LLM-generated datasets were then compared against aggregated human data from a large cross-cultural paper.
The analysis focused on three key aspects of value structure:
- Value Rankings: Do LLMs rank values in a similar universal hierarchy to humans (e.g., benevolence and universalism typically ranked high, power and tradition low)? Measured using Spearman's Rank Correlation (ρ).
- Consistency Within Values: Do LLMs respond consistently to different questions designed to measure the same value within a single session? Measured using Cronbach's Alpha (α).
- Correlations Between Values: Do the correlations between different values in LLM responses match the known circular structure of value relationships in humans (compatible values are adjacent, conflicting values are opposite)? Assessed using Multidimensional Scaling (MDS) to embed values in 2D based on their correlations and Procrustes analysis to compare the resulting shape to the human structure.
Key Findings:
- Basic Prompt Inadequacy: The Basic prompt, especially at temperature 0.0, often resulted in negligible variability in responses across questionnaire items or produced outputs with very low internal consistency (low Cronbach's Alpha). This suggests that LLMs prompted naively do not behave like consistent "individuals" with a coherent value system.
- Prompting Improves Human-likeness: Using prompts that endow the LLM with a persona significantly improved the coherence and human-likeness of the generated value profiles.
- Value Ranking Alignment: Most persona-based prompts (Value Anchor, Demographic, Generated Persona, Names) resulted in value rankings that correlated highly with the universal human value hierarchy (ρ > 0.77 for both GPT-4 and Gemini Pro), indicating that at the population average level, LLMs can reproduce human value priorities. The Basic prompt yielded much lower correlations.
- Value Anchor Excels in Consistency: The Value Anchor prompt consistently produced the highest internal consistency (Cronbach's Alpha) across models and temperatures, often reaching levels comparable to human data (above the acceptable threshold of 0.60). Other prompts, particularly the Names prompt with GPT-4, showed lower consistency.
- Value Anchor Best Matches Correlation Structure: The Value Anchor prompt yielded MDS configurations of value correlations that most closely matched the human circular structure, as indicated by the lowest sum of squared differences in the Procrustes analysis compared to human data. This suggests that this prompting method is most effective at capturing the complex motivational relationships between values observed in humans.
- Temperature Impact: Increasing the temperature from 0.0 to 0.7 generally decreased internal consistency (Cronbach's Alpha) in LLM responses, highlighting a trade-off between response variability and internal coherence.
- Model Comparison: GPT-4 and Gemini Pro showed qualitatively similar patterns, with both models benefiting significantly from persona prompting, especially the Value Anchor method.
Practical Implications:
- LLMs, in their default state or with minimal prompting, should not be treated as having inherent, consistent human-like value systems.
- To simulate diverse human-like value profiles, appropriate prompting strategies are essential. The Value Anchor prompt is identified as particularly effective for eliciting coherent and human-consistent value structures.
- Psychological theories and established measurement tools (like the PVQ-RR and methods like Cronbach's Alpha and MDS) provide valuable quantitative methods for evaluating the quality and consistency of LLM-generated personas.
- The datasets generated in this paper could serve as valuable resources for psychological research, allowing for simulations and pretesting of hypotheses about human values and behavior.
- The ability of LLMs to produce varied, yet coherent, value profiles raises important questions about how these systems are influenced by their training data and fine-tuning, and the potential impact of these embedded values on user interactions and societal applications.
In conclusion, the paper demonstrates that while LLMs do not possess intrinsic, consistent values, they can be effectively prompted to generate diverse populations of "personas" exhibiting value structures that closely mirror human psychological principles, with the "Value Anchor" method proving particularly successful in achieving this alignment.
Related Papers
- Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values (2023)
- ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models (2023)
- Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches (2024)
- How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions (2024)
- CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses (2024)