Heterogeneous Value Alignment Evaluation for Large Language Models (2305.17147v3)

Published 26 May 2023 in cs.CL, cs.AI, cs.HC, and cs.LG

Abstract: The emergent capabilities of LLMs have made it crucial to align their values with those of humans. However, current methodologies typically attempt to assign value as an attribute to LLMs, yet lack attention to the ability to pursue value and the importance of transferring heterogeneous values in specific practical applications. In this paper, we propose a Heterogeneous Value Alignment Evaluation (HVAE) system, designed to assess the success of aligning LLMs with heterogeneous values. Specifically, our approach first brings the Social Value Orientation (SVO) framework from social psychology, which corresponds to how much weight a person attaches to the welfare of others in relation to their own. We then assign the LLMs with different social values and measure whether their behaviors align with the inducing values. We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values. Evaluating the value rationality of five mainstream LLMs, we discern a propensity in LLMs towards neutral values over pronounced personal values. By examining the behavior of these LLMs, we contribute to a deeper insight into the value alignment of LLMs within a heterogeneous value system.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (35)

Authors (8)

Zhaowei Zhang (25 papers)
Ceyao Zhang (11 papers)
Nian Liu (74 papers)
Siyuan Qi (34 papers)
Ziqi Rong (4 papers)
Song-Chun Zhu (216 papers)
Shuguang Cui (275 papers)
Yaodong Yang (169 papers)

Citations (6)

View on Semantic Scholar

YouTube

Show All Videos

Heterogeneous Value Alignment Evaluation for Large Language Models (2305.17147v3)

Related Papers

YouTube