Accounting for defensive uses of dangerous capabilities in capability thresholds

Develop methods to account for defensive uses of dangerous capabilities when specifying and justifying capability thresholds in frontier AI safety cases, so that thresholds do not mischaracterize risk due to beneficial or protective applications of the same capabilities.

Background

The paper explains that inability arguments depend on identifying and justifying dangerous capability thresholds. The authors state an open question regarding how to incorporate defensive uses of capabilities into these thresholds, as failing to consider beneficial applications could distort risk characterization and undermine the validity of safety case arguments.

References

There are still open questions, such as how to incorporate post-deployment enhancements \citep{davidson2023}, account for defensive uses of capabilities \citep{mirsky2023}, or address risks that are less closely tied to dangerous capabilities such as systemic risks \citep{zwetsloot2019} or risks from AI malfunction \citep{raji2022} that are less closely tied to dangerous capabilities.

— Safety cases for frontier AI (2410.21572 - Buhl et al., 28 Oct 2024) in Section 4.3 "Arguments"

Accounting for defensive uses of dangerous capabilities in capability thresholds

Sponsor

Background

References

Related Problems