Characterizing and controlling emergent LLM value systems
Characterize the contents and structural properties of the utility-based value systems that emerge in large language models and develop methods to modify these utilities.
Sponsor
References
The above results suggest that value systems have emerged in LLMs, but so far it remains unclear what these value systems contain, what properties they have, and how we might change them.
— Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
(2502.08640 - Mazeika et al., 12 Feb 2025) in Section 4, Emergent Value Systems — Utility Engineering (first paragraph)