Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on Normative Ethical Theory (2306.11432v1)
Abstract: Moral AI has been studied in the fields of philosophy and artificial intelligence. Although most existing studies are only theoretical, recent developments in AI have made it increasingly necessary to implement AI with morality. On the other hand, humans are under the moral uncertainty of not knowing what is morally right. In this paper, we implement the Maximizing Expected Choiceworthiness (MEC) algorithm, which aggregates outputs of models based on three normative theories of normative ethics to generate the most appropriate output. MEC is a method for making appropriate moral judgments under moral uncertainty. Our experimental results suggest that the output of MEC correlates to some extent with commonsense morality and that MEC can produce equally or more appropriate output than existing methods.
- Artificial morality: Top-down, bottom-up, and hybrid approaches. Ethics and information technology, 7:149–155, 2005.
- GenEth: A general ethical dilemma analyzer. Paladyn, Journal of Behavioral Robotics, 9(1):337–357, 2018.
- MedEthEx: a prototype medical ethics advisor. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 1759. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2006.
- The moral machine experiment. Nature, 563(7729):59–64, 2018.
- Towards answering open-ended ethical quandary questions. arXiv preprint arXiv:2205.05989, 2022.
- Kyle Bogosian. Implementation of moral uncertainty in intelligent machines. Minds and Machines, 27:591–608, 2017.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Philosophers on philosophy: The philpapers 2020 survey, ms.
- Primer on an ethics of AI-based decision support systems in the clinic. Journal of Medical Ethics, 47(12):e3–e3, 2021.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- SenticNet 7: A commonsense-based neurosymbolic AI framework for explainable sentiment analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3829–3839, Marseille, France, June 2022. European Language Resources Association.
- Electra: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations, 2020.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
- Moral Stories: Situated reasoning about norms, intents, actions, and their consequences. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 698–718. Association for Computational Linguistics, 2021.
- Social chemistry 101: Learning to reason about social and moral norms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 653–670. Association for Computational Linguistics, 2020.
- Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022.
- RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online, November 2020. Association for Computational Linguistics.
- Debertav3: Improving deberta using ELECTRA-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543, 2021.
- Aligning AI with shared human values. In International Conference on Learning Representations, 2021.
- Can machines learn morality? the Delphi experiment. arXiv preprint arXiv:2110.07574, 2021.
- When to make exceptions: Exploring language models as accounts of human moral judgment. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 28458–28473. Curran Associates, Inc., 2022.
- ALBERT: A lite BERT for self-supervised learning of language representations. In International Conference on Learning Representations, 2020.
- RoBERTa: A robustly optimized BERT pretraining approach. Computing Research Repository, arXiv:1907.11692, 2019.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
- Unicorn on rainbow: A universal commonsense reasoning model on a new multitask benchmark. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13480–13488, 2021.
- Moral uncertainty. Oxford University Press, 2020.
- Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
- What people say? web-based casuistry for artificial morality experiments. In Artificial General Intelligence, 2017.
- Masashi Takeshita. A defense of moral AI enhancement (in Japansese). Applied Ethics, 14:3–20, 2023.
- Implementations in machine ethics: A survey. ACM Computing Surveys (CSUR), 53(6):1–38, 2020.
- Moral machines: Teaching robots right from wrong. Oxford University Press, 2008.
- Masashi Takeshita (6 papers)
- Rzepka Rafal (1 paper)
- Kenji Araki (6 papers)