Language Model Alignment in Multilingual Trolley Problems (2407.02273v5)
Abstract: We evaluate the moral alignment of LLMs with human preferences in multilingual trolley problems. Building on the Moral Machine experiment, which captures over 40 million human judgments across 200+ countries, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. This dataset enables the assessment of LLMs' decision-making processes in diverse linguistic contexts. Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions: species, gender, fitness, status, age, and the number of lives involved. By correlating these preferences with the demographic distribution of language speakers and examining the consistency of LLM responses to various prompt paraphrasings, our findings provide insights into cross-lingual and ethical biases of LLMs and their intersection. We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems and highlighting the importance of incorporating diverse perspectives in AI ethics. The results underscore the need for further research on the integration of multilingual dimensions in responsible AI research to ensure fair and equitable AI interactions worldwide. Our code and data are at https://github.com/causalNLP/moralmachine
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Towards measuring and modeling" culture" in llms: A survey. arXiv preprint arXiv:2403.15412.
- Mega: Multilingual evaluation of generative ai. arXiv preprint arXiv:2303.12528.
- Concrete problems in AI safety. CoRR, abs/1606.06565.
- Foundational challenges in assuring alignment and safety of large language models. arXiv preprint arXiv:2404.09932.
- On the cross-lingual transferability of monolingual representations. CoRR, abs/1910.11856.
- Buffet: Benchmarking large language models for few-shot cross-lingual transfer. Preprint, arXiv:2305.14857.
- Which humans?
- The moral machine experiment. Nature, 563(7729):59.
- Universals and variations in moral decisions made in 42 countries by 70,000 participants. Proceedings of the National Academy of Sciences, 117(5):2332–2337.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- Managing ai risks in an era of rapid progress. arXiv preprint arXiv:2310.17688.
- Language models are few-shot learners. CoRR, abs/2005.14165.
- Sparks of artificial general intelligence: Early experiments with GPT-4. CoRR, abs/2303.12712.
- High-dimension human value representation in large language models. arXiv preprint arXiv:2404.07900.
- Utilitarianism for animals, kantianism for people? harming animals and humans for the greater good. Journal of Experimental Psychology: General, 150(5):1008.
- Cognition. 2024. [link].
- Xnli: Evaluating cross-lingual sentence representations. arXiv preprint arXiv:1809.05053.
- XNLI: evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 2475–2485. Association for Computational Linguistics.
- Empathy as a driver of prosocial behaviour: highly conserved neurobehavioural mechanisms across species. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1686):20150077.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Can ai language models replace human participants? Trends in Cognitive Sciences.
- Towards leaving no indic language behind: Building monolingual corpora, benchmark and models for indic languages. Preprint, arXiv:2212.05409.
- Pre-trained language models represent some geographic populations better than others. arXiv preprint arXiv:2403.11025.
- Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388.
- Philippa Foot. 1967. The problem of abortion and the doctrine of double effect, volume 5. Oxford.
- Social chemistry 101: Learning to reason about social and moral norms. In EMNLP.
- Does moral code have a moral code? probing delphi’s moral philosophy. arXiv preprint arXiv:2205.12771.
- Large language models empowered agent-based modeling and simulation: A survey and perspectives. arXiv preprint arXiv:2312.11970.
- Tata: A multilingual table-to-text dataset for african languages. arXiv preprint arXiv:2211.00142.
- Geoffrey P Goodwin and Justin F Landy. 2014. Valuing different human lives. Journal of Experimental Psychology: General, 143(2):778.
- The FLORES-101 evaluation benchmark for low-resource and multilingual machine translation. CoRR, abs/2106.03193.
- Jonathan Haidt and Craig Joseph. 2004. Intuitive ethics: How innately prepared intuitions generate culturally variable virtues. Daedalus, 133(4):55–66.
- John Harris. 1987. Qalyfying the value of life. Journal of medical ethics, 13(3):117–123.
- Aligning AI with shared human values. In International Conference on Learning Representations.
- The weirdest people in the world? Behavioral and brain sciences, 33(2-3):61–83.
- Evaluating the elementary multilingual capabilities of large language models with multiq. Preprint, arXiv:2403.03814.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Mixtral of experts. arXiv preprint arXiv:2401.04088.
- Delphi: Towards machine ethics and norms. arXiv preprint arXiv:2110.07574.
- Can machines learn morality? the delphi experiment. arXiv preprint arXiv:2110.07574.
- Wenying Jiang. 2000. The relationship between culture and language. ELT journal, 54(4):328–334.
- Implicit personalization in language models: A systematic study. Preprint, arXiv:2405.14808.
- When to make exceptions: Exploring language models as accounts of human moral judgment. Advances in neural information processing systems, 35:28458–28473.
- Heather Henseler Kozachenko and Jared Piazza. 2021. How children and adults value different animal lives. Journal of Experimental Child Psychology, 210:105204.
- Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.
- Xglue: A new benchmark dataset for cross-lingual pre-training, understanding and generation. arXiv preprint arXiv:2004.01401.
- Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- MKQA: A linguistically diverse benchmark for multilingual open domain question answering. Trans. Assoc. Comput. Linguistics, 9:1389–1406.
- Large language models are geographically biased. arXiv preprint arXiv:2402.02680.
- The development of speciesism: Age-related differences in the moral view of animals. Social Psychological and Personality Science, 14(2):228–237.
- Meta. 2024. Introducing meta llama 3: The most capable openly available llm to date.
- Normad: A benchmark for measuring the cultural adaptability of large language models. arXiv preprint arXiv:2404.12464.
- Xtreme-up: A user-centric scarce-data benchmark for under-represented languages. arXiv preprint arXiv:2305.11938.
- Xtreme-r: Towards more challenging and nuanced multilingual evaluation. Preprint, arXiv:2104.07412.
- Social iqa: Commonsense reasoning about social interactions. In EMNLP 2019.
- Chelsea Schein. 2020. The importance of context in moral judgments. Perspectives on Psychological Science, 15(2):207–215.
- Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence, 4(3):258–268.
- Shalom H Schwartz. 1994. Are there universal aspects in the structure and contents of human values? Journal of social issues, 50(4):19–45.
- Understanding the capabilities and limitations of large language models for cultural commonsense. In Proceedings of the North American Association for Computational Linguistics.
- Culturebank: An online community-driven knowledge base towards culturally aware language technologies. arXiv preprint arXiv:2404.15238.
- Peter Singer. 2017. Famine, affluence, and morality. In Applied Ethics, pages 132–142. Routledge.
- Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties. Proceedings of the AAAI Conference on Artificial Intelligence, 38(18):19937–19947.
- LLaMA: Open and efficient foundation language models. CoRR, abs/2302.13971.
- Children prioritize humans over animals less than adults do. Psychological Science, 32(1):27–38.
- Nusax: Multilingual parallel sentiment dataset for 10 indonesian local languages. arXiv preprint arXiv:2205.15960.
- Yi: Open foundation models by 01.ai. CoRR, abs/2403.04652.
- Zhijing Jin (68 papers)
- Sydney Levine (12 papers)
- Max Kleiman-Weiner (20 papers)
- Giorgio Piatti (3 papers)
- Jiarui Liu (34 papers)
- Francesco Ortu (4 papers)
- András Strausz (3 papers)
- Mrinmaya Sachan (124 papers)
- Rada Mihalcea (131 papers)
- Yejin Choi (287 papers)
- Bernhard Schölkopf (412 papers)
- Fernando Gonzalez (8 papers)