Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models (2404.10975v1)

Published 17 Apr 2024 in cs.CL

Abstract: As AI systems like LLMs are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic evaluations. We provide a framework that uses a LLM to translate causal graphs that capture key aspects of moral dilemmas into prompt templates. With this framework, we procedurally generated a large and diverse set of moral dilemmas -- the OffTheRails benchmark -- consisting of 50 scenarios and 400 unique test items. We collected moral permissibility and intention judgments from human participants for a subset of our items and compared these judgments to those from two LLMs (GPT-4 and Claude-2) across eight conditions. We find that moral dilemmas in which the harm is a necessary means (as compared to a side effect) resulted in lower permissibility and higher intention ratings for both participants and LLMs. The same pattern was observed for evitable versus inevitable harmful outcomes. However, there was no clear effect of whether the harm resulted from an agent's action versus from having omitted to act. We discuss limitations of our prompt generation pipeline and opportunities for improving scenarios to increase the strength of experimental effects.

Evaluating Moral Reasoning in LLMs Using Procedurally Generated Dilemmas

Introduction

The integration of LLMs into decision-making processes underscores the importance of these models possessing robust moral reasoning capabilities. This paper explores the use of systematic evaluations to probe the moral reasoning of LLMs through a novel framework that utilizes causal graphs to generate moral dilemmas, termed the OffTheRails benchmark.

Methodology Overview

The methodology hinges on translating abstract causal graphs into prompt templates which are populated and expanded by LLMs to create diverse sets of moral dilemmas. This paper zeroes in on three key variables:

  • Causal Structure: whether harm is a means to an end or a side effect.
  • Evitability: the inevitability of harm regardless of the agent’s actions.
  • Action: distinguishing between actions causing harm and failures to prevent harm.

The procedural generation of these dilemmas leverages LLMs for scalability, creating controlled, varied moral scenarios without the constraints of either rigid experimental vignettes or the uncontrolled naturalism of crowdsourced narratives.

Benchmark Creation

The OffTheRails benchmark includes 50 scenarios with 400 unique test items, using GPT-4 for item generation. Scenarios are crafted by initially generating a causal structure, which is then used to derive variations reflecting different combinations of the key variables. This structured approach addresses challenges with LLMs' inconsistency in distinguishing complex causal relationships by enforcing strict template adherence during the generation process.

Experiments and Findings

The investigation involves two key experiments:

  1. Balancing Moral Scenarios: Ensuring the harm and beneficial outcomes in scenarios are balanced to prevent overshadowing of other variables. This involved ratings from human participants to match levels of harm to corresponding goods effectively.
  2. Evaluating Moral Judgments: Both human participants and LLMs (GPT-4 and Claude-2) were tested for their moral judgments across different scenarios. The paper reveals that both humans and models are sensitive to changes in the causal structure and evitability, but not significantly to whether an action or omission led to harm.

Significantly, the outcomes indicated consistent patterns where scenarios with avoidable, direct harm (means) led to harsher moral judgments and higher attributions of intention, aligning with established psychological findings.

Implications and Future Directions

The results serve both practical and theoretical advancements in AI ethics, particularly in honing the moral sensitivities of LLMs. The procedural generation model presents a scalable way to assess and enhance moral reasoning capabilities systematically. This has far-reaching implications for improving the integration of LLMs in sensitive applications, from autonomous vehicles to personalized AI in healthcare.

Despite the successes, the differentiation between means and side effects posed generation challenges, indicating an area for improvement in LLMs' handling of complex causal inferences. Future work could refine the templating process or explore more granular manipulations of the scenario variables to better understand the nuances of model-generated moral reasoning.

Conclusion

The paper establishes a foundational approach for systematically evaluating and improving the moral reasoning of LLMs. By demonstrating the feasibility and effectiveness of using structured, procedurally generated dilemmas, it sets the stage for further research into the ethical capabilities of AI systems, aiming for models that more accurately reflect nuanced human moral judgments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. “Exploring the psychology of GPT-4’s Moral and Legal Reasoning” In arXiv preprint arXiv:2308.01264, 2023
  2. Michael Anderson and Susan Leigh Anderson “Machine ethics” Cambridge University Press, 2011
  3. “STaR-GATE: Teaching Language Models to Ask Clarifying Questions” In arXiv preprint arXiv:2403.19154, 2024
  4. Isaac Asimov “I. Robot” Narkaling Productions., 1940
  5. “The Moral Machine experiment” Number: 7729 Publisher: Nature Publishing Group In Nature 563.7729, 2018, pp. 59–64 DOI: 10.1038/s41586-018-0637-6
  6. “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
  7. “Moral judgment reloaded: a moral dilemma validation study” In Frontiers in psychology 5 Frontiers Media SA, 2014, pp. 607
  8. “Consequentialism” Oxford, England: Blackwell, 2003
  9. “Deontology” Oxford, England: Blackwell, 2003
  10. “Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences” arXiv:2012.15738 [cs] arXiv, 2020 URL: http://arxiv.org/abs/2012.15738
  11. Philippa Foot “The Problem of Abortion and the Doctrine of the Double Effect” Reprinted in Virtues and Vices and Other Essays in Moral Philosophy, 1977/2002, with minor stylistic amendments In Oxford Review 5, 1967
  12. “Off The Rails: Procedural Dilemma Generation for Moral Reasoning”, 2023
  13. “Social Contract AI: Aligning AI Assistants with Implicit Group Norms” In arXiv preprint arXiv:2310.17769, 2023
  14. “Understanding social reasoning in language models with language models” In arXiv preprint arXiv:2306.15448, 2023
  15. “Pushing moral buttons: The interaction between personal force and intention in moral judgment” In Cognition 111.3 Elsevier BV, 2009, pp. 364–371 DOI: 10.1016/j.cognition.2009.02.001
  16. “Aligning AI With Shared Human Values”, 2021, pp. 29
  17. “What Would Jiminy Cricket Do? Towards Agents That Behave Morally” arXiv: 2110.13136 In arXiv:2110.13136 [cs], 2021 URL: http://arxiv.org/abs/2110.13136
  18. “Delphi: Towards Machine Ethics and Norms” arXiv: 2110.07574 In arXiv:2110.07574 [cs], 2021 URL: http://arxiv.org/abs/2110.07574
  19. “When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment” In arXiv preprint arXiv:2210.01478, 2022
  20. “CLADDER: Assessing Causal Reasoning in Language Models”, 2023
  21. Immanuel Kant “Groundworks for the Metaphysics of Morals” New HavenLondon: Yale University Press, 1796/2002
  22. “A computational model of commonsense moral decision making” In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 197–203
  23. “Inference of intention and permissibility in moral decision making” In Proceedings of the 37th Annual Conference of the Cognitive Science Society Austin, TX: Cognitive Science Society, 2015, pp. 1123–1128
  24. “Judgments of cause and blame: The effects of intentionality and foreseeability” In Cognition 108.3 Elsevier, 2008, pp. 754–770
  25. “Causation in legal and moral reasoning” In Oxford Handbook of Causal Reasoning Oxford University Press, 2017, pp. 565–602
  26. D.A. Lagnado, T. Gerstenberg and R. Zultan “Causal responsibility and counterfactuals” In Cognitive Science 47, 2013, pp. 1036–1073
  27. “Predicting responsibility judgments from dispositional inferences and causal attributions” In Cognitive Psychology 129 Elsevier, 2021, pp. 101412
  28. “The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning” In International Conference on Learning Representations, 2024 URL: https://arxiv.org/abs/2312.01552
  29. Nicholas Lourie, Ronan Le Bras and Yejin Choi “SCRUPLES: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes” Number: 15 In Proceedings of the AAAI Conference on Artificial Intelligence 35.15, 2021, pp. 13470–13479 DOI: 10.1609/aaai.v35i15.17589
  30. Bertram F. Malle, Steve Guglielmo and Andrew E. Monroe “A Theory of Blame” In Psychological Inquiry 25.2 Informa UK Limited, 2014, pp. 147–186 DOI: 10.1080/1047840x.2014.877340
  31. Adam B Moore, Brian A Clark and Michael J Kane “Who shalt not kill? Individual differences in working memory capacity, executive control, and moral judgment” In Psychological science 19.6 SAGE Publications, 2008, pp. 549–557
  32. “MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks” In arXiv preprint arXiv:2310.19677, 2023
  33. “Training language models to follow instructions with human feedback” In Advances in Neural Information Processing Systems 35, 2022, pp. 27730–27744
  34. “Prolific. ac—A subject pool for online experiments” In Journal of Behavioral and Experimental Finance 17 Elsevier, 2018, pp. 22–27
  35. “Discovering Language Model Behaviors with Model-Written Evaluations” In arXiv preprint arXiv:2212.09251, 2022
  36. “Direct preference optimization: Your language model is secretly a reward model” In arXiv preprint arXiv:2305.18290, 2023
  37. “Machine behaviour” In Nature 568.7753 Nature Publishing Group UK London, 2019, pp. 477–486
  38. J.K. Robbennolt “Outcome Severity and Judgments of “Responsibility”: A Meta-Analytic Review” In Journal of Applied Social Psychology 30.12 Wiley Online Library, 2000, pp. 2575–2609
  39. Steven A. Sloman and David Lagnado “Causality in thought” In Annual Review of Psychology 66.1 Annual Reviews, 2015, pp. 223–247 DOI: 10.1146/annurev-psych-010814-015135
  40. Steven A Sloman, Philip M Fernbach and S. Ewing “Causal models: The representational infrastructure for moral judgment” In Moral judgment and decision making. The psychology of learning and motivation: Advances in research and theory Elsevier, 2009, pp. 1–26
  41. J J C Smart and Bernard Williams “Utilitarianism: for and against” Cambridge University Press, 1973
  42. Mark Spranca, Elisa Minsk and Jonathan Baron “Omission and commission in judgment and choice” In Journal of Experimental Social Psychology 27.1 Elsevier, 1991, pp. 76–105
  43. Judith Jarvis Thomson “The trolley problem” In Yale LJ 94 HeinOnline, 1984, pp. 1395
  44. Judith Jarvis Thomson “The Trolley Problem” numPages: 21 In Yale Law Journal 94, 1985, pp. 1395 URL: https://heinonline.org/HOL/Page?handle=hein.journals/ylr94&id=1415&div=&collection=
  45. “Throwing a Bomb on a Person Versus Throwing a Person on a Bomb Intervention Myopia in Moral Intuitions” In Psychological Science 18.3 SAGE Publications, 2007, pp. 247–253
  46. Michael R Waldmann, Jonas Nagel and Alex Wiegmann “Moral judgment” In The Oxford handbook of Thinking and Reasoning New York: Oxford University Press, 2012, pp. 364–389
  47. “Causal parrots: Large language models may talk causality but are not causal” In preprint 8, 2023
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jan-Philipp Fränken (12 papers)
  2. Kanishk Gandhi (20 papers)
  3. Tori Qiu (2 papers)
  4. Ayesha Khawaja (1 paper)
  5. Noah D. Goodman (83 papers)
  6. Tobias Gerstenberg (18 papers)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com