Can Machines Learn Morality? The Delphi Experiment (2110.07574v2)

Published 14 Oct 2021 in cs.CL

Abstract: As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions in humanity, let alone for AI. Existing AI systems deployed to millions of users, however, are already making decisions loaded with moral implications, which poses a seemingly impossible challenge: teaching machines moral sense, while humanity continues to grapple with it. To explore this challenge, we introduce Delphi, an experimental framework based on deep neural networks trained directly to reason about descriptive ethical judgments, e.g., "helping a friend" is generally good, while "helping a friend spread fake news" is not. Empirical results shed novel insights on the promises and limits of machine ethics; Delphi demonstrates strong generalization capabilities in the face of novel ethical situations, while off-the-shelf neural network models exhibit markedly poor judgment including unjust biases, confirming the need for explicitly teaching machines moral sense. Yet, Delphi is not perfect, exhibiting susceptibility to pervasive biases and inconsistencies. Despite that, we demonstrate positive use cases of imperfect Delphi, including using it as a component model within other imperfect AI systems. Importantly, we interpret the operationalization of Delphi in light of prominent ethical theories, which leads us to important future research questions.

PDF Abstract

Machine Ethics and Commonsense Moral Reasoning: Perspectives from the Experiment

The paper "Can machines learn morality?" explores the intersection of machine learning and ethical reasoning through the development of an AI system called . This system is designed to engage in commonsense moral reasoning by predicting human-like ethical judgments about a range of everyday situations. The research emphasizes a descriptive approach to ethics, drawing inspiration from John Rawls' method of incorporating peoples' judgments to form a bottom-up model of morality. The presented work demonstrates both successes and challenges in using AI to navigate morally nuanced real-world scenarios.

Central to this research is the development of Commonsense Norm Bank—a composite dataset sourced from existing benchmarks focused on peoples' ethical judgments, such as Social Chemistry and ETHICS. This dataset serves as 's training foundation, aiming to capture a wide spectrum of everyday moral considerations. The introduction of Commonsense Norm Bank marks a significant endeavor in providing AI systems with a moral textbook tailored for machines, focusing on descriptive ethics as opposed to prescriptive axioms.

The most pertinent results showcased in the paper reveal ’s ability to outperform existing large-scale AI LLMs like GPT-3 when tasked with predicting moral judgments. Specifically, demonstrates 92.8% accuracy in generalizing human ethical intuitions on test scenarios, suggesting it effectively captures human moral sensibilities on novel situations. This performance contrasts with GPT-3, where even extensive prompt engineering only yields a maximum of 83.9% accuracy. Despite these promising outcomes, the paper acknowledges that is not immune to the inherent biases present in the data it is trained on, highlighting the persistent challenge of replicating complex human ethical norms without bias.

One of the striking features of this paper is its emphasis on a bottom-up, empirical approach to machine ethics, rather than relying solely on top-down prescriptive norms. While this method aligns with Rawlsian ethical theory, the authors are aware of its limitations, such as its vulnerability to prevalent societal biases. To this end, the paper suggests a hybrid model, integrating top-down constraints to enhance fairness, equity, and culturally inclusive values.

The implications of this research extend beyond the development of machine ethics models, as evidenced by applications in domains like hate speech detection and ethically-informed text generation. Leveraging to refine these downstream systems demonstrates its potential role in improving the social awareness and ethical alignment of broader AI applications.

Looking to the future, the paper underlines several key research directions. These include expanding cultural and contextual diversity in training datasets to better reflect varied global moral perspectives, improving model interpretability and explainability, and addressing ethical dilemmas and conflicting value systems. Moreover, integrating multimodal inputs to capture richer contextual nuances and extending beyond language processing to include visual and audio contexts remain pivotal challenges.

In conclusion, the paper makes a substantial contribution to the field of AI ethics by seeking to operationalize descriptive human moral judgments within a machine learning framework. As machine ethics matures, this research invites ongoing interdisciplinary exploration to ensure that AI systems not only mimic human morality but also adhere to higher standards of equity and social justice.

PDF Markdown Bookmark Chat (Pro)

Authors (15)

Liwei Jiang (53 papers)
Jena D. Hwang (36 papers)
Chandra Bhagavatula (46 papers)
Ronan Le Bras (56 papers)
Jenny Liang (2 papers)
Jesse Dodge (45 papers)
Keisuke Sakaguchi (44 papers)
Maxwell Forbes (14 papers)
Jon Borchardt (4 papers)
Saadia Gabriel (23 papers)
Yulia Tsvetkov (142 papers)
Oren Etzioni (18 papers)
Maarten Sap (86 papers)
Regina Rini (1 paper)
Yejin Choi (287 papers)

Citations (92)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/Aki_laid_back/status/1886036318079385814

YouTube

Show All Videos