The Concept of Criticality in AI Safety

Published 12 Jan 2022 in cs.HC and cs.AI | (2201.04632v2)

Abstract: When AI agents don't align their actions with human values they may cause serious harm. One way to solve the value alignment problem is by including a human operator who monitors all of the agent's actions. Despite the fact, that this solution guarantees maximal safety, it is very inefficient, since it requires the human operator to dedicate all of his attention to the agent. In this paper, we propose a much more efficient solution that allows an operator to be engaged in other activities without neglecting his monitoring task. In our approach the AI agent requests permission from the operator only for critical actions, that is, potentially harmful actions. We introduce the concept of critical actions with respect to AI safety and discuss how to build a model that measures action criticality. We also discuss how the operator's feedback could be used to make the agent smarter.

Abstract PDF HTML Upgrade to Chat

References (14)

Citations (1)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

The Concept of Criticality in AI Safety

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (2)

Collections

The Concept of Criticality in AI Safety

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (2)

Collections