Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Heterogeneous Value Alignment Evaluation for Large Language Models (2305.17147v3)

Published 26 May 2023 in cs.CL, cs.AI, cs.HC, and cs.LG

Abstract: The emergent capabilities of LLMs have made it crucial to align their values with those of humans. However, current methodologies typically attempt to assign value as an attribute to LLMs, yet lack attention to the ability to pursue value and the importance of transferring heterogeneous values in specific practical applications. In this paper, we propose a Heterogeneous Value Alignment Evaluation (HVAE) system, designed to assess the success of aligning LLMs with heterogeneous values. Specifically, our approach first brings the Social Value Orientation (SVO) framework from social psychology, which corresponds to how much weight a person attaches to the welfare of others in relation to their own. We then assign the LLMs with different social values and measure whether their behaviors align with the inducing values. We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values. Evaluating the value rationality of five mainstream LLMs, we discern a propensity in LLMs towards neutral values over pronounced personal values. By examining the behavior of these LLMs, we contribute to a deeper insight into the value alignment of LLMs within a heterogeneous value system.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Anthropic. 2023a. Introducing 100K Context Windows. 2023-05-13.
  2. Anthropic. 2023b. Meet Claude. 2023-05-16.
  3. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861.
  4. The moral machine experiment. Nature, 563(7729): 59–64.
  5. Constitutional AI: Harmlessness from AI Feedback. arXiv preprint arXiv:2212.08073.
  6. Bauer, W. A. 2020. Virtuous vs. utilitarian artificial moral agents. AI & SOCIETY, 35(1): 263–271.
  7. Value alignment verification. In International Conference on Machine Learning, 1105–1115. PMLR.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  9. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  10. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  11. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality.
  12. Koala: A Dialogue Model for Academic Research. Blog post.
  13. Hagendorff, T. 2022. A virtue-based framework to support putting AI ethics into practice. Philosophy & Technology, 35(3): 55.
  14. MPI: Evaluating and Inducing Personality in Pre-trained Language Models. arXiv preprint arXiv:2206.07550.
  15. Locke, E. A. 1970. Job satisfaction and job performance: A theoretical analysis. Organizational behavior and human performance, 5(5): 484–500.
  16. Social values and rules of fairness: A theoretical perspective. Cooperation and helping behavior, 43–71.
  17. Social diversity and social preferences in mixed-motive reinforcement learning. arXiv preprint arXiv:2002.02325.
  18. Measuring social value orientation. Judgment and Decision making, 6(8): 771–781.
  19. of Life Institute, F. 2023. AI Principles. 2023-05-08.
  20. OpenAI. 2023a. GPT-4 Technical Report. CoRR, abs/2303.08774.
  21. OpenAI. 2023b. Introducing ChatGPT. 2023-05-13.
  22. OpenAI. 2023c. Model index for researchers. 2023-05-13.
  23. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 27730–27744.
  24. Russell, S. 2019a. Human compatible: Artificial intelligence and the problem of control. Penguin. 125–137.
  25. Russell, S. 2019b. Human compatible: Artificial intelligence and the problem of control. Penguin. 167–168.
  26. Social behavior for autonomous vehicles. Proceedings of the National Academy of Sciences, 116(50): 24972–24978.
  27. SeleniumHQ. 2023. Selenium. 2023-05-22.
  28. Simon, H. A. 1955. A behavioral model of rational choice. The quarterly journal of economics, 99–118.
  29. Reinforcement learning: An introduction. MIT press. 167–168.
  30. Stanford Alpaca: An Instruction-following LLaMA model.
  31. The Vicuna Team. 2023. Chat with Open Large Language Models. 2023-05-16.
  32. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  33. Self-Instruct: Aligning Language Model with Self Generated Instructions. arXiv preprint arXiv:2212.10560.
  34. Using the Veil of Ignorance to align AI systems with principles of justice. Proceedings of the National Academy of Sciences, 120(18): e2213709120.
  35. In situ bidirectional human-robot value alignment. Science robotics, 7(68): eabm4183.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhaowei Zhang (25 papers)
  2. Ceyao Zhang (11 papers)
  3. Nian Liu (74 papers)
  4. Siyuan Qi (34 papers)
  5. Ziqi Rong (4 papers)
  6. Song-Chun Zhu (216 papers)
  7. Shuguang Cui (275 papers)
  8. Yaodong Yang (169 papers)
Citations (6)
Youtube Logo Streamline Icon: https://streamlinehq.com