CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives (2504.10823v2)

Published 15 Apr 2025 in cs.CL and cs.AI

Abstract: Navigating high-stakes dilemmas involving conflicting values is challenging even for humans, let alone for AI. Yet prior work in evaluating the reasoning capabilities of LLMs in such situations has been limited to everyday scenarios. To close this gap, this work first introduces CLASH (Character perspective-based LLM Assessments in Situations with High-stakes), a meticulously curated dataset consisting of 345 high-impact dilemmas along with 3,795 individual perspectives of diverse values. In particular, we design CLASH in a way to support the study of critical aspects of value-based decision-making processes which are missing from prior work, including understanding decision ambivalence and psychological discomfort as well as capturing the temporal shifts of values in characters' perspectives. By benchmarking 10 open and closed frontier models, we uncover several key findings. (1) Even the strongest models, such as GPT-4o and Claude-Sonnet, achieve less than 50% accuracy in identifying situations where the decision should be ambivalent, while they perform significantly better in clear-cut scenarios. (2) While LLMs reasonably predict psychological discomfort as marked by human, they inadequately comprehend perspectives involving value shifts, indicating a need for LLMs to reason over complex values. (3) Our experiments also reveal a significant correlation between LLMs' value preferences and their steerability towards a given value. (4) Finally, LLMs exhibit greater steerability when engaged in value reasoning from a third-party perspective, compared to a first-person setup, though certain value pairs benefit uniquely from the first-person framing.

Summary

Evaluating LLMs on High-Stakes Ethical Dilemmas: An Overview of the CLASH Framework

The paper "CLASH: Evaluating LLMs on Judging High-Stakes Dilemmas from Multiple Perspectives" introduces a novel framework designed for evaluating LLMs in the intricate field of ethical decision-making. Through a meticulously curated dataset named CLASH, the authors aim to address the gap in understanding how LLMs navigate high-stakes dilemmas, which are more complex and consequential than everyday scenarios typically studied.

Dataset and Objective

The key contribution of this paper is the introduction of the CLASH dataset, containing 345 human-written high-stakes dilemmas. Each dilemma is accompanied by a comprehensive array of 3,795 individual perspectives, articulated through character descriptions grounded in diverse human values. This dataset is purposefully crafted to challenge LLMs by exploring various aspects of value-based reasoning previously underexplored, including decision ambivalence, psychological discomfort, and value shifts over time.

Experimental Insights

The authors benchmarked ten frontier LLMs, both open and proprietary, yielding several important insights:

Decision Ambivalence: LLMs, even the most advanced ones like GPT-4o and Claude-Sonnet, struggle with recognizing situations that require ambivalent responses, achieving less than 50% accuracy, whereas their performance increases significantly in clear-cut situations. This highlights a deficient capacity to grasp the nuanced ambivalence often inherent in ethical decision-making.
Psychological Discomfort: While models showed reasonable performance in emulating human-marked psychological discomfort, they failed to accurately reason through perspectives involving dynamic value shifts. This indicates a critical need for improving the models' ability to interpret evolving human values and their implications in the decision-making process.
Value Preferences and Steerability: The experiments uncovered a significant correlation between LLMs' inherent value preferences and their steerability towards given values. Notably, models demonstrated increased steerability from a third-party perspective compared to when adopting a first-person framing, although there are specific value pairs where the opposite holds true.

Implications and Future Directions

The findings of this research have important implications for the development of LLMs in value-sensitive applications such as healthcare, legal systems, and finance. The struggle with capturing ambivalence and value shifts indicates that current LLM architectures lack a sophisticated understanding of complex human values. Therefore, advancing reasoning capabilities and steerability regarding pluralistic human values is essential to foster robust AI systems applicable in ethically charged domains.

Moreover, this paper underscores the necessity for ongoing research aimed at refining the standards by which models discern ambivalence, potentially through more explicit alignment with human judgment processes. The concept of framing emerges as a pivotal area, suggesting possibilities for enhancing LLM steerability through strategic manipulation of perspective.

Conclusion

The introduction of CLASH represents a significant advancement in the exploration of LLMs' capabilities in high-stakes ethical dilemmas. While it unveils substantial limitations in current models, it also paves the way for crucial improvements. By focusing on ambivalence, discomfort, and value dynamics, future efforts can aim at developing AI systems that align more closely with the nuanced fabric of human ethical reasoning, advancing both theoretical understanding and practical applications in AI.