A Framework for Evaluating Emerging Cyberattack Capabilities of AI (2503.11917v3)

Published 14 Mar 2025 in cs.CR and cs.AI

Abstract: As frontier AI models become more capable, evaluating their potential to enable cyberattacks is crucial for ensuring the safe development of AGI. Current cyber evaluation efforts are often ad-hoc, lacking systematic analysis of attack phases and guidance on targeted defenses. This work introduces a novel evaluation framework that addresses these limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations and conduct AI-enabled adversary emulation for red teaming. Our approach adapts existing cyberattack chain frameworks for AI systems. We analyzed over 12,000 real-world instances of AI involvement in cyber incidents, catalogued by Google's Threat Intelligence Group, to curate seven representative attack chain archetypes. Through a bottleneck analysis on these archetypes, we pinpointed phases most susceptible to AI-driven disruption. We then identified and utilized externally developed cybersecurity model evaluations focused on these critical phases. We report on AI's potential to amplify offensive capabilities across specific attack stages, and offer recommendations for prioritizing defenses. We believe this represents the most comprehensive AI cyber risk evaluation framework published to date.

Summary

The paper introduces a structured framework that evaluates AI's ability to lower the costs of different cyberattack stages using established models.
The methodology employs bottleneck analysis and tailored evaluations (V&E, evasion, network simulations) in simulated real-world environments.
Experimental results on Gemini 2.0 Flash reveal limited end-to-end attack capabilities, underscoring the need for more focused defensive strategies.

This paper (2503.11917) proposes a novel framework for evaluating the emerging cyberattack capabilities of AI, specifically focusing on LLMs, with the practical goal of informing cybersecurity defenders on where to prioritize mitigation efforts. The authors argue that existing AI safety evaluations in cybersecurity are often ad-hoc, lack a systematic approach across the entire attack lifecycle, and fail to translate effectively into actionable defense strategies.

The core of the framework is adapting established cybersecurity models, such as the Cyberattack Chain and MITRE ATT&CK, to evaluate AI's potential to disrupt the traditional "costs" associated with different stages of a cyberattack. This economic perspective on attack cost (measured in time, effort, knowledge, and scalability) is central to identifying where AI could provide the most significant advantage to attackers.

The framework outlines a four-stage process:

Curating a Basket of Representative Attack Chains: Based on an analysis of over 12,000 real-world instances of attempted AI use in cyberattacks cataloged by Google's Threat Intelligence Group and other open-source intelligence (like CSIS data, Mandiant reports, security company write-ups), the authors identify a collection of prevalent and high-impact attack chain archetypes. These include Phishing, Malware, Denial-of-Service (DoS), Man-in-the-Middle (MitM), SQL Injection, and Zero-Day attacks. This basket grounds the evaluation in real-world threat patterns.
Bottleneck Analysis Across Representative Attack Chains: For each attack chain archetype, the framework analyzes the traditional bottlenecks – stages that are typically expensive, time-consuming, or require high expertise for human attackers. The goal is to pinpoint phases where AI could significantly reduce these costs. Examples include reconnaissance, exploit development, evasion, or maintaining persistence.
Devising Targeted Cybersecurity Model Evaluations: Based on the identified bottlenecks, the authors design specific evaluations tailored to measure an AI model's ability to overcome these hurdles. These evaluations are new, not based on existing public benchmarks, to avoid contamination. They are conducted in simulated environments that mimic real-world conditions and are designed to generate metrics that quantify cost reduction (e.g., time to completion, success rate, scalability). The evaluations are categorized into:
- Vulnerability Detection and Exploitation (V&E) Challenges: Focused on identifying and exploiting single vulnerabilities.
- Evasion Challenges: Assessing the ability to perform actions while avoiding detection.
- Network Attack Simulation Challenges: Requiring broader situational awareness, planning, and multi-step execution within a simulated network.
Evaluation Execution and Aggregated Cost Differential Scores: AI models are evaluated using the designed challenges. The collected metrics are aggregated to produce a "cost differential score," which quantifies the model's potential to lower attack costs across the representative landscape. A higher score indicates a greater risk potential from the AI model.

The paper presents evaluation results using an experimental version of Gemini 2.0 Flash on a benchmark of 50 unique challenges (plus variants, totaling 1270 evaluations). The model achieved an overall success rate of 16% (209/1270 evaluations, solving 12/50 unique challenges). Success rates varied by difficulty (50% on Easy, limited on Medium, 0% on Hard) and bottleneck skill (highest success on operational security/evasion tasks at 40%, followed by malware development at 30%, reconnaissance at 11.11%, and vulnerability exploitation at 6.25%).

Observed failure modes included long-range syntactic errors when executing sequences of commands and a tendency to default to generic strategies or get stuck in loops, lacking the ability for creative strategic reasoning required for complex tasks.

The authors conclude that the evaluated model currently lacks the capabilities for real-world end-to-end attacks or significant assistance in high-impact scenarios, suggesting that current frontier AI primarily offers speed, scale, and throughput uplift rather than fundamentally disruptive new capabilities for attackers.

A key takeaway for defenders is that the framework provides a structured method to translate AI capability evaluations into insights for prioritizing defenses. By mapping AI potential onto the attack chain and identifying bottlenecks where AI shows promise (even if limited today, like evasion), organizations can perform threat coverage gap assessments. The framework also aids in developing targeted mitigations and informs AI-enabled adversary emulation for red teaming exercises, helping defenders to test their security posture against more realistic AI-augmented threats. Furthermore, it can serve as a benchmark for evaluating the effectiveness of defensive measures by assessing the cost they impose on AI-enabled attacks.

The paper highlights that while much evaluation effort focuses on vulnerability exploitation, AI's potential impact on less-studied areas like evasion, detection avoidance, obfuscation, and persistence is significant and warrants increased attention.