2000 character limit reached
Foundational Moral Values for AI Alignment (2311.17017v1)
Published 28 Nov 2023 in cs.CY and cs.AI
Abstract: Solving the AI alignment problem requires having clear, defensible values towards which AI systems can align. Currently, targets for alignment remain underspecified and do not seem to be built from a philosophically robust structure. We begin the discussion of this problem by presenting five core, foundational values, drawn from moral philosophy and built on the requisites for human existence: survival, sustainable intergenerational existence, society, education, and truth. We show that these values not only provide a clearer direction for technical alignment work, but also serve as a framework to highlight threats and opportunities from AI systems to both obtain and sustain these values.
- Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.
- S. T. Aquinas et al. The summa theologica: Complete edition. Catholic Way Publishing, 2014.
- The complete works of Aristotle, volume 2. Princeton University Press Princeton, 1984.
- A general language assistant as a laboratory for alignment, 2021.
- Computational ethics. Trends in Cognitive Sciences, 26(5):388–405, 2022.
- Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022a.
- Constitutional ai: Harmlessness from ai feedback, 2022b.
- Applications of ai in education. XRDS: Crossroads, The ACM Magazine for Students, 3(1):11–15, 1996.
- S. Besson and J. Tasioulas. The philosophy of international law. Oxford University Press, 2010.
- N. Bontridder and Y. Poullet. The role of artificial intelligence in disinformation. Data & Policy, 3:e32, 2021.
- N. Bostrom. Superintelligence: Paths, dangers, strategies. 2014.
- D. Brown. Human universals. pages 135–6, 1991.
- K. Casler and D. Kelemen. Young children’s rapid learning about artifacts. Developmental Science, 8(6):472–480, 2005.
- Creativity support in the age of large language models: An empirical study involving emerging writers. arXiv preprint arXiv:2309.12570, 2023.
- A multilevel framework for ai governance. arXiv preprint arXiv:2307.03198, 2023.
- C. A. Confucius. The analects. translated by dc lau, 1979.
- Deep learning of aftershock patterns following large earthquakes. Nature, 560(7720):632–634, 2018.
- J. Donnelly. Universal human rights in theory and practice. Cornell University Press, 2013.
- M. Fasoli. The overuse of digital technologies: human weaknesses, design strategies and ethical concerns. Philosophy & Technology, 34(4):1409–1427, 2021.
- M. Fernandez and H. Alani. Artificial intelligence and online extremism: Challenges and opportunities. 2021.
- Ethics in the age of disruptive technologies: An operational roadmap. 2023.
- Naturalizing ethics. The Blackwell companion to naturalism, pages 1–25, 2016.
- I. Gabriel. Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437, 2020.
- Chatgpt and the future of work: A comprehensive analysis of ai’s impact on jobs and employment. Partners Universal International Innovation Journal, 1(3):154–186, 2023.
- Improving alignment of dialogue agents via targeted human judgements, 2022.
- GoogleAI. Artificial intelligence at google: Our principles, 2022. URL https://ai.google/principles/.
- B. P. Green. Artificial intelligence, decision-making, and moral deskilling. Markkula Center for Applied Ethics, 2019.
- B. P. Green. Convergences in the ethics of space exploration. Social and conceptual issues in astrobiology, pages 179–196, 2020.
- Aligning ai with shared human values. arXiv preprint arXiv:2008.02275, 2020.
- R. D. Hicks. Aristotle de anima. Cambridge University Press, 2015.
- A multi-level framework for the ai alignment problem. arXiv preprint arXiv:2301.03740, 2023.
- IBM. Ai ethics, 2022. URL https://www.ibm.com/artificial-intelligence/ethics.
- Health system-scale language models are all-purpose prediction engines. Nature, pages 1–6, 2023.
- H. Jonas. The imperative of responsibility: In search of an ethics for the technological age. University of Chicago press, 1984.
- I. Kant and L. W. Beck. Immanuel kant: Foundations of the metaphysics of morals, 1989.
- Alignment of language agents. arXiv preprint arXiv:2103.14659, 2021.
- The empty signifier problem: Towards clearer paradigms for operationalising "alignment" in large language models, 2023.
- All the news that’s fit to fabricate: Ai-generated text as a tool of media misinformation. Journal of experimental political science, 9(1):104–117, 2022.
- N. Kshetri. Artificial intelligence in developing countries. IT Prof., 22(4):63–68, 2020.
- D. C. Lau. Confucius: the analects. 2000.
- D. Lee. Crossbow intruder who wanted to “kill queen” given nine-year sentence, Oct 2023. URL https://www.bbc.com/news/live/uk-66108009.
- J. Legge et al. Confucian analects: The great learning, and the doctrine of the mean. Courier Corporation, 1971.
- Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018.
- A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35, 2021.
- L. H. Meyer. Intergenerational justice. Routledge, 2017.
- MicrosoftStaff. Our approach, 2022. URL https://www.microsoft.com/en-us/ai/our-approach?activetab=pivot1%3Aprimaryr5.
- Auditing large language models: a three-layered approach. AI and Ethics, pages 1–31, 2023.
- T. Moynihan. X-risk: How humanity discovered its own extinction. MIT Press, 2020.
- Ml for flood forecasting at scale, 2019.
- M. C. Nussbaum. Women and human development: The capabilities approach, volume 3. Cambridge university press, 2000.
- OpenAI. Openai charter, 2023. URL https://openai.com/charter.
- Artificial intelligence in education: Challenges and opportunities for sustainable development. 2019.
- Plato. Meno. Liberal Arts Press New York, 1949.
- Plato’s Phaedo, volume 120. Cambridge University Press, 1972.
- S. Raaijmakers. Artificial intelligence for law enforcement: challenges and opportunities. IEEE security & privacy, 17(5):74–77, 2019.
- R. Raja and P. Nagasubramani. Impact of modern technology in education. Journal of Applied and Advanced Research, 3(1):33–35, 2018.
- R. A. Rappaport. Ecology, meaning, and religion. North Atlantic Books, 1979.
- S. Russell. Human compatible: Artificial intelligence and the problem of control. Penguin, 2019.
- Salesforce. Ethical use policy, 2022. URL https://www.salesforce.com/company/intentional-innovation/ethical-use-policy/.
- Combating disinformation in a social media age. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(6):e1385, 2020.
- H. Shue. Basic rights: Subsistence, affluence, and US foreign policy. princeton University press, 2020.
- Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties. arXiv preprint arXiv:2309.00779, 2023.
- Systematic review of smart health monitoring using deep learning and artificial intelligence. Neuroscience Informatics, 2(3):100028, 2022.
- J. Tasioulas. Artificial intelligence, humanistic ethics. Daedalus, 151(2):232–243, 2022.
- F. Teng. Climate change and moral responsibility toward future generations: A confucian perspective. Philosophy East and West, 71(2):451–472, 2021.
- P. Torres. Who would destroy the world? omnicidal agents and related phenomena. Aggression and Violent Behavior, 39:129–138, 2018.
- New dimensions in testimony: Digitally preserving a holocaust survivor’s interactive storytelling. In Interactive Storytelling: 8th International Conference on Interactive Digital Storytelling, ICIDS 2015, Copenhagen, Denmark, November 30-December 4, 2015, Proceedings 8, pages 269–281. Springer, 2015.
- D. o. E. United Nations and S. Affairs. Sustainable development: The 17 goals. URL https://sdgs.un.org/goals.
- S. Vallor. Moral deskilling and upskilling in a new machine age: Reflections on the ambiguous future of character. Philosophy & Technology, 28:107–124, 2015.
- Ethics in technology practice. The Markkula Center for Applied Ethics at Santa Clara University. https://www. scu. edu/ethics, 2018.
- H. Wang. Algorithmic colonization of love: The ethical challenges of dating app algorithms in the age of ai. Techné: Research in Philosophy and Technology, 27(2):260–280, 2023.
- fastmri: An open dataset and benchmarks for accelerated mri. arXiv preprint arXiv:1811.08839, 2018.
- Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023.
- Betty Li Hou (5 papers)
- Brian Patrick Green (2 papers)