Accounting for defensive uses of dangerous capabilities in capability thresholds
Develop methods to account for defensive uses of dangerous capabilities when specifying and justifying capability thresholds in frontier AI safety cases, so that thresholds do not mischaracterize risk due to beneficial or protective applications of the same capabilities.
References
There are still open questions, such as how to incorporate post-deployment enhancements \citep{davidson2023}, account for defensive uses of capabilities \citep{mirsky2023}, or address risks that are less closely tied to dangerous capabilities such as systemic risks \citep{zwetsloot2019} or risks from AI malfunction \citep{raji2022} that are less closely tied to dangerous capabilities.
— Safety cases for frontier AI
(2410.21572 - Buhl et al., 28 Oct 2024) in Section 4.3 "Arguments"