The Concept of Criticality in AI Safety (2201.04632v2)
Abstract: When AI agents don't align their actions with human values they may cause serious harm. One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions. Despite the fact, that this solution guarantees maximal safety, it is very inefficient, since it requires the human operator to dedicate all of his attention to the agent. In this paper, we propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task. In our approach the AI agent requests permission from the operator only for critical actions, that is, potentially harmful actions. We introduce the concept of critical actions with respect to AI safety and discuss how to build a model that measures action criticality. We also discuss how the operator's feedback could be used to make the agent smarter.
- Apprenticeship Learning via Inverse Reinforcement Learning. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory.
- Reinforcement Learning as a Framework for Ethical Decision Making. In AAAI Workshop: AI, Ethics, and Society, volume WS-16-02 of AAAI Workshops. AAAI Press. 978-1-57735-759-9.
- Artificial Morality: Top-down, Bottom-up, and Hybrid Approaches. Ethics and Inf. Technol. 7(3). ISSN 1388-1957. doi:10.1007/s10676-006-0004-4. URL https://doi.org/10.1007/s10676-006-0004-4.
- Against the moral Turing test: accountable design and the moral reasoning of autonomous systems. Ethics and Information Technology 18. doi:10.1007/s10676-016-9389-x.
- Moral Decision-Making by Analogy: Generalizations versus Exemplars. In AAAI, 501–507. AAAI Press.
- Toward a General Logicist Methodology for Engineering Ethically Correct Robots. IEEE Intell. Syst. 21(4): 38–44.
- An Integrated Reasoning Approach to Moral Decision-Making. volume 3, 1280–1286. ISBN 9780511978036. doi:10.1017/CBO9780511978036.024.
- Cooperative inverse reinforcement learning. In Advances in neural information processing systems, 3909–3917.
- Robot Ethics: The Ethical and Social Implications of Robotics. The MIT Press. ISBN 026252600X.
- Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, 663–670. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. ISBN 1558607072.
- Russell, S. 1998. Learning Agents for Uncertain Environments (Extended Abstract). In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, 101–103. New York, NY, USA: Association for Computing Machinery. ISBN 1581130570. doi:10.1145/279943.279964. URL https://doi.org/10.1145/279943.279964.
- Research Priorities for Robust and Beneficial Artificial Intelligence. AI Magazine 36(4): 105–114. doi:10.1609/aimag.v36i4.2577. URL https://ojs.aaai.org/index.php/aimagazine/article/view/2577.
- Sun, R. 2013. Moral Judgment, Human Motivation, and Neural Networks. Cognitive Computation 5. doi:10.1007/s12559-012-9181-0.
- Wolchover. 2015. Concerns of an artificial intelligence pioneer. Quanta Magazine .
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.