Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 160 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 98 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

The Concept of Criticality in AI Safety (2201.04632v2)

Published 12 Jan 2022 in cs.HC and cs.AI

Abstract: When AI agents don't align their actions with human values they may cause serious harm. One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions. Despite the fact, that this solution guarantees maximal safety, it is very inefficient, since it requires the human operator to dedicate all of his attention to the agent. In this paper, we propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task. In our approach the AI agent requests permission from the operator only for critical actions, that is, potentially harmful actions. We introduce the concept of critical actions with respect to AI safety and discuss how to build a model that measures action criticality. We also discuss how the operator's feedback could be used to make the agent smarter.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Apprenticeship Learning via Inverse Reinforcement Learning. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory.
  2. Reinforcement Learning as a Framework for Ethical Decision Making. In AAAI Workshop: AI, Ethics, and Society, volume WS-16-02 of AAAI Workshops. AAAI Press. 978-1-57735-759-9.
  3. Artificial Morality: Top-down, Bottom-up, and Hybrid Approaches. Ethics and Inf. Technol. 7(3). ISSN 1388-1957. doi:10.1007/s10676-006-0004-4. URL https://doi.org/10.1007/s10676-006-0004-4.
  4. Against the moral Turing test: accountable design and the moral reasoning of autonomous systems. Ethics and Information Technology 18. doi:10.1007/s10676-016-9389-x.
  5. Moral Decision-Making by Analogy: Generalizations versus Exemplars. In AAAI, 501–507. AAAI Press.
  6. Toward a General Logicist Methodology for Engineering Ethically Correct Robots. IEEE Intell. Syst. 21(4): 38–44.
  7. An Integrated Reasoning Approach to Moral Decision-Making. volume 3, 1280–1286. ISBN 9780511978036. doi:10.1017/CBO9780511978036.024.
  8. Cooperative inverse reinforcement learning. In Advances in neural information processing systems, 3909–3917.
  9. Robot Ethics: The Ethical and Social Implications of Robotics. The MIT Press. ISBN 026252600X.
  10. Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, 663–670. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. ISBN 1558607072.
  11. Russell, S. 1998. Learning Agents for Uncertain Environments (Extended Abstract). In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, 101–103. New York, NY, USA: Association for Computing Machinery. ISBN 1581130570. doi:10.1145/279943.279964. URL https://doi.org/10.1145/279943.279964.
  12. Research Priorities for Robust and Beneficial Artificial Intelligence. AI Magazine 36(4): 105–114. doi:10.1609/aimag.v36i4.2577. URL https://ojs.aaai.org/index.php/aimagazine/article/view/2577.
  13. Sun, R. 2013. Moral Judgment, Human Motivation, and Neural Networks. Cognitive Computation 5. doi:10.1007/s12559-012-9181-0.
  14. Wolchover. 2015. Concerns of an artificial intelligence pioneer. Quanta Magazine .
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.