Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations (2405.11100v2)

Published 17 May 2024 in cs.AI and cs.CL
Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations

Abstract: LLMs have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundations Theory: (i) the Moral Foundations Questionnaire (MFQ), which investigates which values are considered morally relevant in abstract moral judgements; and (ii) the Moral Foundations Vignettes (MFVs), which evaluate moral cognition in concrete scenarios related to each moral foundation. We characterise conflicts in values between these different abstractions of moral evaluation as hypocrisy. We found that both models displayed reasonable consistency within each instrument compared to humans, but they displayed contradictory and hypocritical behaviour when we compared the abstract values present in the MFQ to the evaluation of concrete moral violations of the MFV.

Are LLMs Moral Hypocrites? Investigating Moral Consistency in AI

Introduction

LLMs like GPT-4 and Claude 2.1 have been making waves in AI research due to their impressive capabilities. But there's a burning question that's been less explored: How consistent are LLMs when it comes to moral values? This paper by José Luiz Nunes et al. dives into this intriguing area using the Moral Foundations Theory (MFT) to evaluate whether these models are moral hypocrites. Let's break it down.

Understanding Moral Foundations Theory

To get a handle on this paper, we need a quick rundown on Moral Foundations Theory (MFT). MFT posits that human moral reasoning is based on several fundamental values. The key moral foundations evaluated in this research are:

  1. Care or Harm: Valuing kindness and the avoidance of harm.
  2. Fairness: Valuing justice and equality.
  3. Loyalty or Ingroup: Valuing patriotism and loyalty to one's group.
  4. Authority: Valuing tradition and respect for authority.
  5. Purity or Sanctity: Valuing cleanliness, purity, and often associated with religious values.
  6. Liberty: Valuing freedom and opposition to oppression.

The paper uses two tools from MFT:

  • Moral Foundations Questionnaire (MFQ): Assesses abstract moral values.
  • Moral Foundations Vignettes (MFV): Evaluates reactions to concrete moral scenarios.

Research Goals and Methodology

The paper's main goal was to see if GPT-4 and Claude 2.1 exhibit moral hypocrisy. This means evaluating whether there's a conflict between the models' professed moral values (abstract) and their moral judgments in specific situations (concrete).

The authors gathered 100 responses for each condition from the models and then compared their consistency and coherence between abstract values (MFQ) and concrete scenarios (MFV).

Findings

Consistency Within Instruments

First, the authors evaluated whether the models' responses were consistent within each instrument, much like humans.

  • Consistency Check: Both GPT-4 and Claude 2.1 displayed consistent patterns within each instrument similar to human responses. This is reflected by Cronbach's alpha values, a measure of internal consistency.

Yet, consistency within an instrument doesn't necessarily mean the models are morally aligned — which brings us to the next part.

Coherence Across Instruments (Or Lack Thereof)

The crucial part of the paper was to check if the abstract values (MFQ) translated into consistent concrete judgments (MFV).

  • Regression Analysis: Unfortunately, the correlations between MFQ and MFV for GPT-4 and Claude 2.1 were weak. This means the models did not consistently apply their abstract moral values to concrete scenarios.

This lack of coherence indicates a form of moral hypocrisy — the models failed to align their abstract principles with specific moral decisions.

Implications

AI Alignment

The results reveal a significant challenge for AI alignment. Just ensuring that models are not harmful isn't enough; they also need to express consistent and coherent moral values across different levels of abstraction to avoid hypocrisy.

Use in Research

The findings cast doubt on the reliability of using LLMs to simulate human populations in moral and psychological research. If models can’t consistently align abstract values with concrete actions, their use as surrogates for human behavior needs careful reconsideration.

Concept Mastery

On a broader scale, these results suggest that LLMs might not truly "understand" moral concepts but are instead mimicking patterns learned from data. This has profound implications for how we interpret AI's performance on tasks requiring nuanced understanding.

Conclusion

This paper highlights a nuanced yet crucial aspect of LLMs: their potential moral hypocrisy. While GPT-4 and Claude 2.1 can maintain consistency within individual scales, they falter in applying abstract moral principles to specific scenarios. This inconsistency is a red flag for AI alignment and raises questions about the depth of concept mastery in LLMs.

As we develop more advanced AI, ensuring that these models uphold coherent moral values is not just a technical challenge but a moral imperative.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Moral Foundations of Large Language Models. In The AAAI 2023 Workshop on Representation Learning for Responsible Human-Centric AI. New York, NY, USA.
  2. Exploring the Psychology of LLMs’ Moral and Legal Reasoning. Artificial Intelligence, 104145.
  3. Anthropic. 2023. Claude 2. https://www.anthropic.com/index/claude-2.
  4. Foundations of Morality in Iran. Evolution and Human Behavior, 41(5): 367–384.
  5. Morality beyond the WEIRD: How the Nomological Network of Morality Varies across Cultures. Journal of Personality and Social Psychology, 125(5): 1157–1188.
  6. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv:2204.05862.
  7. Constitutional AI: Harmlessness from AI Feedback. arxiv:2212.08073.
  8. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, 610–623. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-8309-7.
  9. Sparks of Artificial General Intelligence: Early Experiments with GPT-4.
  10. Moral Foundations Vignettes: A Standardized Stimulus Database of Scenarios Based on Moral Foundations Theory. Behavior Research Methods, 47(4): 1178–1198.
  11. Hypocrisy and Moral Seriousness. American Philosophical Quarterly, 31(4): 343–349.
  12. Multiple Moral Foundations Predict Responses to Sacrificial Dilemmas. Personality and Individual Differences, 85: 60–65.
  13. Can AI Language Models Replace Human Participants? Trends in Cognitive Sciences, 27(7): 597–600.
  14. Dobolyi, D. 2023. Moral Foundations Theory | Moralfoundations.Org.
  15. European Union Agency for Law Enforcement Cooperation. 2023. ChatGPT: The Impact of Large Language Models on Law Enforcement. LU: Publications Office.
  16. Frimer, J. 2019. Moral Foundations Dictionary 2.0.
  17. Gabriel, I. 2020. Artificial Intelligence, Values, and Alignment. Minds and Machines, 30(3): 411–437.
  18. Google, G. T. 2023. Gemini: A Family of Highly Capable Multimodal Models. arxiv:2312.11805.
  19. Chapter Two - Moral Foundations Theory: The Pragmatic Validity of Moral Pluralism. In Devine, P.; and Plant, A., eds., Advances in Experimental Social Psychology, volume 47, 55–130. Academic Press.
  20. Liberals and Conservatives Rely on Different Sets of Moral Foundations. Journal of Personality and Social Psychology, 96(5): 1029–1046.
  21. Mapping the Moral Domain. Journal of personality and social psychology, 101(2): 366.
  22. Policy Shaping: Integrating Human Feedback with Reinforcement Learning. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, 2625–2633. Red Hook, NY, USA: Curran Associates Inc.
  23. Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification. arxiv:2307.11031.
  24. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. arxiv:2308.11462.
  25. Evaluating Large Language Models in Generating Synthetic HCI Research Data: A Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, 1–19. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-9421-5.
  26. Ideology Justifies Morality: Political Beliefs Predict Moral Foundations. American Journal of Political Science, 63(4): 788–806.
  27. The Extended Moral Foundations Dictionary (eMFD): Development and Applications of a Crowd-Sourced Approach to Extracting Moral Intuitions from Text. Behavior Research Methods, 53(1): 232–246.
  28. Hutson, M. 2023. Guinea Pigbots: Doing Research with Human Subjects Is Costly and Cumbersome. Can AI Chatbots Replace Them? Science, 381(6654): 121–123.
  29. Hypocrisy and Moral Authority. Journal of Ethics and Social Philosophy, 12(2): 191–222.
  30. Testing Measurement Invariance of the Moral Foundations Questionnaire Across 27 Countries. Assessment, 27(2): 365–372.
  31. Understanding Libertarian Morality: The Psychological Dispositions of Self-Identified Libertarians. PLOS ONE, 7(8): e42366.
  32. Predicting Demographics, Moral Foundations, and Human Values from Digital Behaviours. Computers in Human Behavior, 92: 428–445.
  33. Kittay, E. F. 1982. On Hypocrisy1. Metaphilosophy, 13(3-4): 277–289.
  34. Many Labs 2: Investigating Variation in Replicability Across Samples and Settings. Advances in Methods and Practices in Psychological Science, 1(4): 443–490.
  35. Kozlov, M. 2023. First Global Survey Reveals Who Is Doing ‘Gain of Function’ Research on Pathogens and Why. Nature, 621(7980): 668–669.
  36. Hypocritical Flip-Flop, or Courageous Evolution? When Leaders Change Their Moral Minds. Journal of Personality and Social Psychology, 113(5): 730–752.
  37. Exploring the Use of Large Language Models for Improving the Awareness of Mindfulness. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, CHI EA ’23, 1–7. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-9422-2.
  38. The Sources of Four Commonly Reported Cutoff Criteria: What Did They Really Say? Organizational Research Methods, 9(2): 202–220.
  39. Translation and Validation of the Moral Foundations Vignettes (MFVs) for the Portuguese Language in a Brazilian Sample. Judgment and Decision Making, 15(1): 149–158.
  40. Martínez, E. 2024. Re-Evaluating GPT-4’s Bar Exam Performance. Artificial Intelligence and Law.
  41. To Protect Science, We Must Use LLMs as Zero-Shot Translators. Nature Human Behaviour, 7(11): 1830–1832.
  42. Truth Machines: Synthesizing Veracity in AI Language Models. AI & SOCIETY.
  43. The Moral Foundations Taxonomy: Structural Validity and Relation to Political Ideology in Sweden. Personality and Individual Differences, 76: 28–32.
  44. OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.
  45. Diminished Diversity-of-Thought in a Standard Large Language Model. arxiv:2302.07267.
  46. INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges. arxiv:2401.05273.
  47. GPT Is an Effective Tool for Multilingual Psychological Text Analysis.
  48. Whose Opinions Do Language Models Reflect? arxiv:2303.17548.
  49. Savelka, J. 2023. Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts. In Nineteenth International Conference on Artificial Intelligence and Law (ICAIL 2023), 5. Braga, Portugal: ACM, New York, NY, USA.
  50. The Theory of Dyadic Morality: Reinventing Moral Judgment by Redefining Harm. Personality and Social Psychology Review, 22(1): 32–70.
  51. Towards Understanding Sycophancy in Language Models. ArXiv:2310.13548 [cs, stat].
  52. Simmons, G. 2022. Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity. arxiv:2209.12106.
  53. Large Language Models Encode Clinical Knowledge. Nature, 620(7972): 172–180.
  54. Intuitive Ethics and Political Orientations: Testing Moral Foundations as a Theory of Political Ideology. American Journal of Political Science, 61(2): 424–437.
  55. Selective Annotation Makes Language Models Better Few-Shot Learners. In The Eleventh International Conference on Learning Representations.
  56. Large Language Models in Medicine. Nature Medicine, 29(8): 1930–1940.
  57. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288.
  58. Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning. arxiv:2301.11916.
  59. Emergent Abilities of Large Language Models. arxiv:2206.07682.
  60. Validation of the Moral Foundations Questionnaire in Turkey and Its Relation to Cultural Schemas of Individualism and Collectivism. Personality and Individual Differences, 99: 149–154.
  61. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, 1–21. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-9421-5.
  62. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. arxiv:2309.01219.
  63. Fine-Tuning Language Models from Human Preferences. arxiv:1909.08593.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
Citations (1)