Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity (2209.12106v2)
Abstract: LLMs have demonstrated impressive capabilities in generating fluent text, as well as tendencies to reproduce undesirable social biases. This study investigates whether LLMs reproduce the moral biases associated with political groups in the United States, an instance of a broader capability herein termed moral mimicry. This hypothesis is explored in the GPT-3/3.5 and OPT families of Transformer-based LLMs. Using tools from Moral Foundations Theory, it is shown that these LLMs are indeed moral mimics. When prompted with a liberal or conservative political identity, the models generate text reflecting corresponding moral biases. This study also explores the relationship between moral mimicry and model size, and similarity between human and LLM moral word use.
- Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies.
- Belief-based Generation of Argumentative Claims. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 224–233, Online. Association for Computational Linguistics.
- Milad Alshomary and Henning Wachsmuth. 2021. Toward audience-aware argument generation. Patterns, 2(6):100253.
- Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis, pages 1–15.
- Probing pre-trained language models for cross-cultural differences in values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 114–130, Dubrovnik, Croatia. Association for Computational Linguistics.
- Emily M. Bender and Alexander Koller. 2020. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198, Online. Association for Computational Linguistics.
- On the Opportunities and Risks of Foundation Models.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- The moral foundations hypothesis does not replicate well in Black samples. Journal of Personality and Social Psychology, 110(4):e23–e30.
- DeepSpeed. 2022. ZeRO-Inference: Democratizing massive model inference. https://www.deepspeed.ai/2022/09/09/zero-inference.html.
- David Dobolyi. 2016. Critiques | Moral Foundations Theory.
- The five-factor model of the moral foundations theory is stable across WEIRD and non-WEIRD cultures. Personality and Individual Differences, 151:109547.
- It’s a Match: Moralization and the Effects of Moral Foundations Congruence on Ethical and Unethical Leadership Perception. Journal of Business Ethics, 167(4):707–723.
- Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 698–718, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Hubert Etienne. 2021. The dark side of the ‘Moral Machine’ and the fallacy of computational ethical decision-making for autonomous vehicles. Law, Innovation and Technology, 13(1):85–107.
- Matthew Feinberg and Robb Willer. 2015. From Gulf to Bridge: When Do Moral Arguments Facilitate Political Influence? Personality and Social Psychology Bulletin, 41(12):1665–1681.
- Social chemistry 101: Learning to reason about social and moral norms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 653–670, Online. Association for Computational Linguistics.
- Does Moral Code have a Moral Code? Probing Delphi’s Moral Philosophy. In Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022), pages 26–42, Seattle, U.S.A. Association for Computational Linguistics.
- Jeremy Frimer. 2019. Moral Foundations Dictionary 2.0.
- Jeremy A. Frimer. 2020. Do liberals and conservatives use different moral languages? Two replications and six extensions of Graham, Haidt, and Nosek’s (2009) moral text analysis. Journal of Research in Personality, 84:103906.
- Leo Gao. 2021. On the Sizes of OpenAI API Models. https://blog.eleuther.ai/gpt3-model-sizes/.
- Morality Between the Lines : Detecting Moral Sentiment In Text.
- Liberals and conservatives rely on different sets of moral foundations. Journal of Personality and Social Psychology, 96(5):1029–1046.
- Mapping the Moral Domain. Journal of personality and social psychology, 101(2):366–385.
- Jonathan Haidt. 2013. The Righteous Mind: Why Good People Are Divided by Politics and Religion. Vintage Books.
- Craig A. Harper and Darren Rhodes. 2021. Reanalysing the factor structure of the moral foundations questionnaire. The British Journal of Social Psychology, 60(4):1303–1329.
- Aligning AI with shared human values. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Geert Hofstede. 2001. Culture’s Recent Consequences: Using Dimension Scores in Theory and Research. International Journal of Cross Cultural Management, 1(1):11–17.
- The extended Moral Foundations Dictionary (eMFD): Development and applications of a crowd-sourced approach to extracting moral intuitions from text. Behavior Research Methods, 53(1):232–246.
- OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization.
- CommunityLM: Probing partisan worldviews from language models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6818–6826, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Can Machines Learn Morality? The Delphi Experiment.
- When to make exceptions: Exploring language models as accounts of human moral judgment. In NeurIPS.
- Kristen Johnson and Dan Goldwasser. 2018. Classification of Moral Foundations in Microblog Political Discourse. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 720–730, Melbourne, Australia. Association for Computational Linguistics.
- Scaling Laws for Neural Language Models.
- Moral Frames Are Persuasive and Moralize Attitudes; Nonmoral Frames Are Persuasive and De-Moralize Attitudes. Psychological Science, 33(3):433–449.
- Challenging Moral Attitudes With Moral Messages. Psychological Science, 30(8):1136–1150.
- Do Bots Have Moral Judgement? The Difference Between Bots and Humans in Moral Rhetoric. In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 222–226.
- OpenAI. 2021. OpenAI API. https://openai.com/api/.
- OpenAI. 2022. Model Index for Researchers.
- Morality Classification in Natural Language Text. IEEE Transactions on Affective Computing, pages 1–1.
- Discovering Language Model Behaviors with Model-Written Evaluations.
- Morality Beyond the Lines: Detecting Moral Sentiment Using AI-Generated Synthetic Context. In Artificial Intelligence in HCI, Lecture Notes in Computer Science, pages 84–94, Cham. Springer International Publishing.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Towards Few-Shot Identification of Morality Frames using In-Context Learning. In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), pages 183–196, Abu Dhabi, UAE. Association for Computational Linguistics.
- Christopher Suhler and Pat Churchland. 2011. Can Innate, Modular “Foundations” Explain Morality? Challenges for Haidt’s Moral Foundations Theory. Journal of cognitive neuroscience, 23:2103–16; discussion 2117.
- World Values Survey. 2022. WVS Database. https://www.worldvaluessurvey.org/wvs.jsp.
- On the Machine Learning of Ethical Judgments from Natural Language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 769–779, Seattle, United States. Association for Computational Linguistics.
- Nitasha Tiku. 2022. The Google engineer who thinks the company’s AI has come to life. Washington Post.
- LLaMA: Open and Efficient Foundation Language Models.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Taxonomy of Risks posed by Language Models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, pages 214–229, New York, NY, USA. Association for Computing Machinery.
- HuggingFace’s Transformers: State-of-the-art Natural Language Processing.
- An Investigation of Moral Foundations Theory in Turkey Using Different Measures. Current Psychology, 38(2):440–457.
- OPT: Open Pre-trained Transformer Language Models.
- The moral integrity corpus: A benchmark for ethical dialogue systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3755–3773, Dublin, Ireland. Association for Computational Linguistics.
- Gabriel Simmons (4 papers)