Number of successful attacks required for catastrophic harm in deployment
Determine the number of successful sabotage side tasks required to cause catastrophic harm in deployments of language-model-based software engineering agents monitored asynchronously, within a specific system architecture and threat model, to enable accurate and defensible safety estimation for real deployments.
References
But some uncertainties will remain, such as how many attacks are needed to cause harm, or whether attacks are independent (although we may be able to get some idea by careful threat modelling and additional measurements).
— Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
(2512.13526 - Stickland et al., 15 Dec 2025) in Appendix C, Deployment Simulation Details (Table 4 context)