Incorporating post-deployment enhancements into inability-based safety case arguments

Determine methodology to incorporate post-deployment enhancements into the assessment of frontier AI systems within inability-based safety case arguments, ensuring that capability thresholds and risk evaluations remain valid after tools, scaffolding, prompting, or fine-tuning modify capabilities post-deployment.

Background

Early frontier AI safety cases will likely rely on inability arguments that claim systems are not capable enough to cause serious harm. The authors note that such arguments face open questions, including how to account for capability increases that occur through post-deployment enhancements, which can significantly affect risk assessments and the validity of capability thresholds.

References

There are still open questions, such as how to incorporate post-deployment enhancements \citep{davidson2023}, account for defensive uses of capabilities \citep{mirsky2023}, or address risks that are less closely tied to dangerous capabilities such as systemic risks \citep{zwetsloot2019} or risks from AI malfunction \citep{raji2022} that are less closely tied to dangerous capabilities.

— Safety cases for frontier AI (2410.21572 - Buhl et al., 2024) in Section 4.3 "Arguments"

Incorporating post-deployment enhancements into inability-based safety case arguments

Background

References

Related Problems