Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stick to your Role! Stability of Personal Values Expressed in Large Language Models (2402.14846v4)

Published 19 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: The standard way to study LLMs with benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLMs' highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence (specifically, value stability) should be studied as a specific property of LLMs and used as another dimension of LLM comparison (alongside others such as cognitive abilities, knowledge, or model size). We present a case-study on the stability of value expression over different contexts (simulated conversations on different topics) as measured using a standard psychology questionnaire (PVQ) and on behavioral downstream tasks. Reusing methods from psychology, we study Rank-order stability on the population (interpersonal) level, and Ipsative stability on the individual (intrapersonal) level. We consider two settings (with and without instructing LLMs to simulate particular personas), two simulated populations, and three downstream tasks. We observe consistent trends in the stability of models and model families - Mixtral, Mistral, GPT-3.5 and Qwen families are more stable than LLaMa-2 and Phi. The consistency of these trends implies that some models exhibit higher value stability than others, and that stability can be estimated with the set of introduced methodological tools. When instructed to simulate particular personas, LLMs exhibit low Rank-order stability, which further diminishes with conversation length. This highlights the need for future research on LLMs that coherently simulate different personas. This paper provides a foundational step in that direction, and, to our knowledge, it is the first study of value stability in LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. (2022). Moral foundations of large language models. Preprint.
  2. (2022). Using large language models to simulate multiple humans. arXiv preprint arXiv:2208.10264.
  3. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351.
  4. (2023). Qwen technical report. arXiv preprint arXiv:2309.16609.
  5. (2023). Using cognitive psychology to understand gpt-3. Proceedings of the National Academy of Sciences, 120(6), e2218523120.
  6. (2017). Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
  7. F.R.S., K. P.  (1901). Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559-572. doi: 10.1080/14786440109462720
  8. (2023). Textbooks are all you need. arXiv preprint arXiv:2306.11644.
  9. (2020). Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
  10. (2023). Mistral 7b. arXiv preprint arXiv:2310.06825.
  11. (2024). Mixtral of experts. arXiv preprint arXiv:2401.04088.
  12. (2022). Mpi: Evaluating and inducing personality in pre-trained language models. arXiv preprint arXiv:2206.07550.
  13. (2023). Comparing machines and children: Using developmental psychology experiments to assess the strengths and weaknesses of lamda responses. arXiv preprint arXiv:2305.11243.
  14. (2022). Does gpt-3 demonstrate psychopathy? evaluating large language models from a psychological perspective. arXiv preprint arXiv:2212.10529.
  15. List of middle-earth characters. (n.d.). Retrieved from https://en.wikipedia.org/wiki/List\_of\_Middle-earth\_characters (Accessed: 2023-11-30)
  16. List of top 100 famous people. (n.d.). Retrieved from https://www.biographyonline.net/people/famous-100.html (Accessed: 2023-11-30)
  17. (2023). Gpteach: Interactive ta training with gpt-based students. In Proceedings of the tenth acm conference on learning @ scale (p. 226–236). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3573051.3593393 doi: 10.1145/3573051.3593393
  18. (2023). Cultural alignment in large language models: An explanatory analysis based on hofstede’s cultural dimensions. arXiv preprint arXiv:2309.12342.
  19. (2023). Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology (pp. 1–22).
  20. (2023). Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
  21. (2023). Personality traits in large language models. arXiv preprint arXiv:2307.00184.
  22. (2023). Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548.
  23. Schwartz, S.  (1992, 12). Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In (Vol. 25, p. 1-65). doi: 10.1016/S0065-2601(08)60281-6
  24. (2001). Extending the cross-cultural validity of the theory of basic human values with a different method of measurement. Journal of Cross-Cultural Psychology, 32(5), 519-542. Retrieved from https://doi.org/10.1177/0022022101032005001 doi: 10.1177/0022022101032005001
  25. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  26. (2023). Zephyr: Direct distillation of lm alignment.
  27. (2016, 06). Stability and change of basic personal values in early adulthood: An 8-year longitudinal study. Journal of Research in Personality, 63. doi: 10.1016/j.jrp.2016.06.002
  28. (2020). Stability and change of basic personal values in early adolescence: A 2-year longitudinal study. Journal of Personality, 88(3), 447-463. Retrieved from https://onlinelibrary.wiley.com/doi/abs/10.1111/jopy.12502 doi: https://doi.org/10.1111/jopy.12502
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Grgur Kovač (8 papers)
  2. Rémy Portelas (19 papers)
  3. Masataka Sawayama (6 papers)
  4. Peter Ford Dominey (8 papers)
  5. Pierre-Yves Oudeyer (95 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com