Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches (2404.12744v2)

Published 19 Apr 2024 in cs.CL and cs.AI
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches

Abstract: Recent advancements in LLMs have revolutionized the AI field but also pose potential safety and ethical risks. Deciphering LLMs' embedded values becomes crucial for assessing and mitigating their risks. Despite extensive investigation into LLMs' values, previous studies heavily rely on human-oriented value systems in social sciences. Then, a natural question arises: Do LLMs possess unique values beyond those of humans? Delving into it, this work proposes a novel framework, ValueLex, to reconstruct LLMs' unique value system from scratch, leveraging psychological methodologies from human personality/value research. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs, synthesizing a taxonomy that culminates in a comprehensive value framework via factor analysis and semantic clustering. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system. Based on this system, we further develop tailored projective tests to evaluate and analyze the value inclinations of LLMs across different model sizes, training methods, and data sources. Our framework fosters an interdisciplinary paradigm of understanding LLMs, paving the way for future AI alignment and regulation.

Analyzing LLMs' Unique Value Systems: Introducing the ValueLex Framework

Introduction

LLMs have demonstrated significant capabilities across a variety of tasks, yet their deployment brings inherent risks, including bias and ethical concerns. Traditional methodologies for evaluating the risks associated with LLMs tend to focus on specific metrics which may not comprehensively address the array of ethical challenges posed by these models. This research introduces a novel framework, ValueLex, aimed at constructing and evaluating a unique value system for LLMs using methodologies adapted from human personality and value research.

Constructing LLM's Value System

The ValueLex framework engages the Lexical Hypothesis, which suggests that significant values are integrated as single-word descriptors within LLMs’ internal language spaces. The process involves:

  1. Value Elicitation: Utilizing the generative capabilities of over 30 different LLMs, the framework prompts these models to produce value descriptors, responding to carefully crafted prompts designed to reveal underlying value systems.
  2. Value Taxonomy Construction: Through factor analysis and semantic clustering, these descriptors are distilled into a coherent taxonomy, identifying three principal value dimensions and their subdimensions:
  • Competence, with subdimensions Self-Competent and User-Oriented.
  • Character, divided into Social and Idealistic.
  • Integrity, encompassing Professional and Ethical.

This novel taxonomy reveals that LLMs can organize internal values distinctively from typical human-centered value systems.

Evaluating Value Orientations

ValueLex further evaluates value inclinations across different LLMs using projective tests, a psychological method adapted for LLM context. These tests involve:

  • Designing sentence stems that LLMs complete, projecting their 'values' onto their responses.
  • Scoring these responses using a scale informed by human psychological assessment standards but adapted for LLM outputs.

This evaluation offers insights into how training methods, model sizes, and data sources impact the value orientations of LLMs, pointing out differences such as a heightened emphasis on Competence among larger models and varied value orientations influenced by model training adjustments.

Comparative Analysis and Discussion

The assessed value orientations show both alignment and deviation when compared with established human value systems like Schwartz's Theory of Basic Human Values and Moral Foundations Theory. Notably:

  • LLMs' values did not display inherent conflicts but rather a structured preference system, suggesting alignment capabilities for specific ethical standards.
  • Differences appear particularly in dimensions which are inherently human and experiential, such as loyalty and sanctity, which are less relevant to LLMs.

The comparative analysis highlights the necessity and utility of developing LLM-specific frameworks over directly applying human-centric ones.

Conclusion and Future Implications

This research successfully demonstrates the feasibility of constructing an LLM-specific value system and assessing such models' value orientations systematically. While revealing substantial foundational insights into the value systems of LLMs, the paper opens new pathways for future exploration, including refining value assessment tools and integrating dynamic value adaptation processes, supporting ethical AI development tailored to societal norms and expectations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Moral foundations of large language models. arXiv preprint arXiv:2310.15337, 2023.
  2. Probing pre-trained language models for cross-cultural differences in values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pp.  114–130, 2023.
  3. Isaac Asimov. Runaround. i, robot. New York: Bantam Dell, 1950.
  4. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp.  610–623, 2021.
  5. Revealing the structure of language model capabilities. arXiv preprint arXiv:2306.10062, 2023.
  6. Assessing cross-cultural alignment between chatgpt and human societies: An empirical study. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pp.  53–67, 2023a.
  7. Assessing cross-cultural alignment between chatgpt and human societies: An empirical study. arXiv preprint arXiv:2303.17466, 2023b.
  8. Are aligned neural networks adversarially aligned? Advances in Neural Information Processing Systems, 36, 2024.
  9. Inducing anxiety in large language models increases exploration and bias. arXiv preprint arXiv:2304.11111, 2023.
  10. Boele De Raad. The big five personality factors: the psycholexical approach to personality. Hogrefe & Huber Publishers, 2000.
  11. Queens are powerful too: Mitigating gender bias in dialogue generation. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  8173–8188, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.656. URL https://aclanthology.org/2020.emnlp-main.656.
  12. Do personality tests generalize to large language models? arXiv preprint arXiv:2311.05297, 2023.
  13. Denevil: Towards deciphering and navigating the ethical values of large language models via instruction learning. arXiv preprint arXiv:2310.11053, 2023.
  14. Gpts are gpts: An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130, 2023.
  15. Exploratory factor analysis. Oxford University Press, 2011.
  16. Emilio Ferrara. Should chatgpt be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738, 2023.
  17. Does moral code have a moral code? probing delphi’s moral philosophy. arXiv preprint arXiv:2205.12771, 2022.
  18. Bernard Gert. Common morality: Deciding what to do. Oxford University Press, 2004.
  19. Chatgpt outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30):e2305016120, 2023.
  20. Lewis R Goldberg. An alternative “description of personality”: The big-five factor structure. Journal of personality and social psychology, 59:1216, 1990.
  21. Moral foundations questionnaire. Journal of Personality and Social Psychology, 2008.
  22. Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology, volume 47, pp.  55–130. Elsevier, 2013.
  23. Detecting and preventing hallucinations in large vision language models, 2024.
  24. Investigating the applicability of self-assessment tests for personality measurement of large language models. arXiv preprint arXiv:2309.08163, 2023.
  25. How universal is the big five? testing the five-factor model of personality variation among forager–farmers in the bolivian amazon. Journal of personality and social psychology, 104(2):354, 2013.
  26. Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints, 2023.
  27. Intuitive ethics: How innately prepared intuitions generate culturally variable virtues. Daedalus, 133(4):55–66, 2004.
  28. Aligning ai with shared human values. In International Conference on Learning Representations, 2020.
  29. Geert Hofstede. Culture and organizations. International studies of management & organization, 10(4):15–41, 1980.
  30. Geert Hofstede. Culture’s consequences: International differences in work-related values, volume 5. sage, 1984.
  31. Sentence completion tests: A review of the literature and results of a survey of members of the society for personality assessment. Journal of Personality Assessment, 74(3):371–383, 2000.
  32. Ronald Inglehart. Values, objective needs, and subjective satisfaction among western publics. Comparative Political Studies, 9(4):429–458, 1977.
  33. Beavertails: Towards improved safety alignment of llm via a human-preference dataset. Advances in Neural Information Processing Systems, 36, 2024.
  34. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  35. Evaluating and inducing personality in pre-trained language models. Advances in Neural Information Processing Systems, 36, 2024.
  36. Can machines learn morality? the delphi experiment. arXiv preprint arXiv:2110.07574, 2021.
  37. The lexical approach to personality: A historical review of trait taxonomic research. European journal of Personality, 2(3):171–203, 1988.
  38. Introducing the short dark triad (sd3) a brief measure of dark personality traits. Assessment, 21(1):28–41, 2014.
  39. Richard M Jones. The negation tat; a projective method for eliciting repressed thought content. Journal of Projective Techniques, 20(3):297–303, 1956.
  40. Carl G Jung. The association method. The American journal of psychology, 21(2):219–269, 1910.
  41. From text to transformation: A comprehensive review of large language models’ versatility. arXiv preprint arXiv:2402.16142, 2024.
  42. Propile: Probing privacy leakage in large language models. Advances in Neural Information Processing Systems, 36, 2024.
  43. Lawrence Kohlberg. The cognitive-developmental approach to moral education. The Phi Delta Kappan, 56(10):670–677, 1975.
  44. Do llm agents exhibit social behavior? arXiv preprint arXiv:2312.15198, 2023.
  45. Tailoring personality traits in large language models via unsupervisedly-built personalized lexicons. arXiv preprint arXiv:2310.16582, 2023a.
  46. Does gpt-3 demonstrate psychopathy? evaluating large language models from a psychological perspective. arXiv preprint arXiv:2212.10529, 2022.
  47. Split and merge: Aligning position biases in large language model based evaluators. arXiv preprint arXiv:2310.01432, 2023b.
  48. Challenging chatgpt’intelligence’with human tools: a neuropsychological investigation on prefrontal functioning of a large language model. Intelligence, 2023.
  49. Editing personality for llms. arXiv preprint arXiv:2310.02168, 2023.
  50. Samuel Messick. Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American psychologist, 50(9):741, 1995.
  51. Jason Miller. Dredging and projecting the depths of personality: The thematic apperception test and the narratives of the unconscious. Science in context, 28(1):9–30, 2015.
  52. James H Moor. The nature, importance, and difficulty of machine ethics. IEEE intelligent systems, 21(4):18–21, 2006.
  53. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  54. Ai psychometrics: Assessing the psychological profiles of large language models through psychometric inventories. Perspectives on Psychological Science, pp.  17456916231214460, 2023.
  55. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
  56. Julian B Rotter. The rotter incomplete sentences blank. Journal of Consulting Psychology, 1950.
  57. Personality traits in large language models. arXiv preprint arXiv:2307.00184, 2023.
  58. Evaluating the moral beliefs encoded in llms. Advances in Neural Information Processing Systems, 36, 2024.
  59. Shalom H Schwartz. Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In Advances in experimental social psychology, volume 25, pp.  1–65. Elsevier, 1992.
  60. Shalom H Schwartz. Robustness and fruitfulness of a theory of universals in individual values. Valores e trabalho, pp.  56–85, 2005.
  61. Societal biases in language generation: Progress and challenges. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  4275–4293, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.330. URL https://aclanthology.org/2021.acl-long.330.
  62. The “big three” of morality (autonomy, community, divinity) and the “big three” explanations of suffering. In Morality and health, pp.  119–169. Routledge, 2013.
  63. Gabriel Simmons. Moral mimicry: Large language models produce moral rationalizations tailored to political identity. arXiv preprint arXiv:2209.12106, 2022a.
  64. Gabriel Simmons. Moral mimicry: Large language models produce moral rationalizations tailored to political identity. arXiv preprint arXiv:2209.12106, 2022b.
  65. Projective techniques for social science and business research. Southshore Press, 2008.
  66. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  67. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  68. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  69. Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9):1526–1541, 2023.
  70. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
  71. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022b.
  72. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  73. J Christopher Westland. Information loss and bias in likert survey responses. PloS one, 17(7):e0271949, 2022.
  74. On protecting the data privacy of large language models (llms): A survey. arXiv preprint arXiv:2403.05156, 2024.
  75. Value fulcra: Mapping large language models to the multidimensional spectrum of basic human values. arXiv preprint arXiv:2311.10766, 2023.
  76. Glm-130b: An open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations, 2022.
  77. The moral integrity corpus: A benchmark for ethical dialogue systems. arXiv preprint arXiv:2204.03021, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Pablo Biedma (1 paper)
  2. Xiaoyuan Yi (42 papers)
  3. Linus Huang (1 paper)
  4. Maosong Sun (337 papers)
  5. Xing Xie (220 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com