Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment (2504.10886v1)

Published 15 Apr 2025 in cs.CY, cs.AI, and cs.CL

Abstract: Deploying LLMs with agency in real-world applications raises critical questions about how these models will behave. In particular, how will their decisions align with humans when faced with moral dilemmas? This study examines the alignment between LLM-driven decisions and human judgment in various contexts of the moral machine experiment, including personas reflecting different sociodemographics. We find that the moral decisions of LLMs vary substantially by persona, showing greater shifts in moral decisions for critical tasks than humans. Our data also indicate an interesting partisan sorting phenomenon, where political persona predominates the direction and degree of LLM decisions. We discuss the ethical implications and risks associated with deploying these models in applications that involve moral decisions.

Authors (5)

Jiseon Kim (12 papers)
Jea Kwon (1 paper)
Luiz Felipe Vecchietti (9 papers)
Alice Oh (82 papers)
Meeyoung Cha (63 papers)

Summary

Persona-dependent Alignment in LLMs: An Analysis of the Moral Machine Experiment

The paper "Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment" investigates how LLMs respond to moral dilemmas presented through the Moral Machine experiment, focusing on variations in decisions based on different sociodemographic personas. This paper underscores the significance of understanding LLM behavior in real-world applications, particularly when models are tasked with making critical moral decisions.

Overview

The Moral Machine experiment, originally designed by Awad et al., collects human judgments in hypothetical autonomous driving scenarios where moral choices involve prioritizing the lives of different entities—such as pedestrians versus passengers, or humans versus animals. This setup provides a structured framework to explore human moral decision-making across various sociodemographic factors, and the current paper extends this framework to evaluate LLMs under persona-conditioned prompts.

Methodology

The authors introduce personas based on seven sociodemographic categories: age, education, gender, income, political affiliation, religion, and culture. Each category includes contrasting attributes (e.g., older vs. younger, conservative vs. progressive), allowing the researchers to gauge variations in LLM decisions as personas change. For analyzing these variations, the paper calculates Average Marginal Component Effect (AMCE) values for both human and LLM responses across nine different moral scenarios.

The alignment between human and LLM judgments is assessed through a novel metric termed Moral Decision Distance (MDD), representing the Euclidean distance between the decision vectors of contrasting personas. Lower MDD values indicate a closer alignment between decisions made for different personas, providing insights into the consistency of moral reasoning.

Results and Findings

The findings reveal significant persona-dependent variability in LLM decision-making patterns:

Comparison to Human Baselines: LLMs exhibit more dramatic shifts in moral decisions compared to human groups when personas are applied. GPT-4o showed the strongest alignment in baseline settings, whereas GPT-3.5 and Llama2 exhibited more frequent deviations.
Persona-induced Variability: Assigning political personas resulted in the highest degree of variation in the decision boundaries for LLMs, particularly in prioritizing social status, supporting the notion that LLMs may amplify political biases and preferences.
Decision Flips and Consistency: A notable proportion of scenario-based decisions showed preference reversals when specific personas were applied to LLMs. This contrasts with stable human preferences, emphasizing LLMs' susceptibility to bias based on instruction-specific context.

Implications

The paper’s findings highlight key ethical considerations for LLM deployment in real-world scenarios, emphasizing the need to improve model robustness and alignment with human moral values. The vulnerability of current LLMs to sociodemographic contexts points to the importance of incorporating comprehensive persona modeling in training regimes to mitigate risks associated with biased and inconsistent decision-making.

Future Directions

The paper suggests that enhancing the moral machine experiment to encompass a broader range of dilemmas beyond autonomous vehicles will provide a more holistic evaluation platform. Furthermore, refining persona modeling to include intersectionality and more nuanced sociodemographic constructs could enhance understanding of LLM bias and variability.

Conclusion

This paper provides a critical perspective on the alignment of LLMs with human moral reasoning in persona-dependent contexts. The research underscores the ethical risks and complexities in deploying intelligent systems for moral decision-making, advocating for improved LLM training frameworks that account for diverse human values and sociodemographic contexts. As AI technology progresses, understanding these dynamics will be paramount in ensuring models function ethically and reliably in complex, real-world applications.

Related Papers

Find Related Papers

Tweets

https://twitter.com/jiseon_kim1/status/1912402554392309845