Do current AIs have meaningful values?
Determine whether current large language models possess meaningful internal values in a substantive sense, rather than only exhibiting superficial behaviors or biases learned from training data.
References
Tracking the emergence of goals and values has proven a longstanding problem, and despite much interest over the years it remains unclear whether current AIs have meaningful values.
— Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
(2502.08640 - Mazeika et al., 12 Feb 2025) in Abstract