Attack Success Rate (ASR)

Updated 28 July 2025

Attack Success Rate (ASR) is a metric that quantifies the proportion of adversarial instances that induce a specific, attacker-desired outcome, highlighting model vulnerabilities.
The topic reviews diverse methodologies—including iterative gradient-based, black-box, and psychoacoustic attacks—across domains like speech recognition, vision, and NLP.
High ASR values reveal persistent weaknesses in AI systems, underscoring the need for robust, context-aware defense strategies in adversarial settings.

Attack Success Rate (ASR) is a central evaluation metric in adversarial machine learning that quantifies the proportion of attack instances which successfully induce a specific, attacker-desired outcome in a target model. While the fundamental definition is consistent—the fraction or percentage of successful adversarial manipulations over the total tested—the definition, calculation, and significance of ASR vary according to domain, task, and attack objective. This article comprehensively reviews ASR as formalized and operationalized in the current research on adversarial attacks and defenses, drawing on precise results and methodologies across automatic speech recognition (ASR) systems, vision and LLMs, and backdoor/NLP settings.

1. Formal Definitions and Contextualization of ASR

The precise definition of Attack Success Rate is highly context-dependent and is often tailored to the adversarial goals for the system under examination:

Targeted Attack on Speech Recognition: ASR is computed as the percentage of adversarial audio samples for which the output transcription of the ASR system exactly matches an attacker-chosen phrase. That is, a targeted attack is only counted as successful if the model transcribes the precise target, not just any change in output (Das et al., 2018).
Untargeted and Black-box Audio Attacks: Here, ASR is measured as the percentage of adversarial examples where the model’s output differs in any way from that received on the original (non-attacked) sample. Typically, this is operationalized as the ratio of successful attacks (producing a non-zero Word Error Rate, WER) to the number of adversarial attempts:

$\text{ASR} = \frac{\text{Number of successful attacks}}{\text{Total number of attacks}}$

(Wu et al., 2021)

Backdoor Attacks in NLP: For backdoored NLP models, ASR is the proportion of triggered test samples (containing a backdoor trigger) that result in the attacker’s designated target label (Shen et al., 2022). In advanced settings, the metric may be refined to “attack success rate difference” (ASRD), measuring the differential between poisoned and clean models to isolate true backdoor effectiveness.
Adversarial Attacks on Vision or Multimodal Models: ASR captures the percentage of test cases for which an adversarial perturbation causes the model to produce a specific, attacker-defined action, prediction, or output (e.g., generating, omitting, or altering an object label) (Jun et al., 2023, Li et al., 13 Mar 2025, Miao et al., 3 Jul 2025).
Formal Expression in Evaluation: In all scenarios, ASR remains a normalized metric, presenting success as a percentage across $N$ trials:

$\text{ASR} = \frac{1}{N} \sum_{i=1}^N \mathbb{I}\left( \text{attack on } x_i \text{ is successful} \right)$

ASR serves both as a proxy for the vulnerability of an ML system and as a primary benchmark for the effectiveness of adversarial and defense methodologies.

2. Methodological Frameworks for Measuring and Maximizing ASR

Different adversarial methodologies operationalize and seek to maximize ASR via various strategies:

Iterative Gradient-based Targeted Attacks: As seen in ADAGIO, an iterative adversarial generator manipulates deep ASR models (e.g., DeepSpeech) to produce a desired output, measuring ASR by the strict match criterion for targeted transcriptions (Das et al., 2018).
Black-box, Transfer-based, and Psychoacoustic Modeling: Attacks that modify audio using psychoacoustic frequency masking generate imperceptible perturbations. Success is scored via the induced WER or non-identical output transcriptions across multiple model architectures to reflect model-agnostic attack power (Wu et al., 2021).
Feature Disentanglement: In DifAttack, adversarial and visual features are disentangled; only the adversarial component is optimized to alter model predictions without perceptual changes, with ASR reflecting query efficiency and transferability in black-box classification (Jun et al., 2023).
Prompt-Engineering in Language and Multimodal Models: Techniques such as PAL use surrogate guidance and loss optimization (e.g., cross-entropy over token sequences) to maximize the odds of a harmful target sequence, operationalized as the probability of the model emitting the target output (ASR) after adversarial suffix injection (Sitawarin et al., 15 Feb 2024, Miao et al., 3 Jul 2025).
Adaptive Multi-Task and Structure Transformation Attacks: Sibling-Attack and StructTransform leverage multi-task learning or syntactic domain conversion (e.g., SQL, JSON) to boost ASR, particularly in transfer and defense-evasion scenarios (Li et al., 2023, Yoosuf et al., 17 Feb 2025).
Concealed and Delayed Backdoor Activation: In ReVeil, the attack maintains low pre-deployment ASR (hidden backdoor) until camouflage samples are unlearned, at which point the ASR returns to high post-deployment values—enabling a temporal control mechanism on attack effectiveness (Alam et al., 17 Feb 2025).

3. Empirical Results, ASR Benchmarks, and Defense Efficacy

Experimental results present a wide spread of ASR values under various conditions, targeting state-of-the-art models. Table 1 provides representative results from different application domains and attack methods.

Paper/Method (arXiv id)	Target Task/Model	ASR (%)	Defense Mechanisms
(Das et al., 2018) (ADAGIO)	Mozilla DeepSpeech (ASR)	92.5 → 0 (AMR/MP3)	Psychoacoustic compression
(Wu et al., 2021)	LibriSpeech/DeepSpeech/Sphinx	>90	Waveguard (weak)
(Li et al., 2023) (Sibling-Attack)	Face++/Microsoft FR	86.5+	Attribute Recog. Joint
(Jun et al., 2023) (DifAttack)	ImageNet/CIFAR (targeted)	100	-
(Sitawarin et al., 15 Feb 2024) (PAL)	GPT-3.5-turbo/GPT-4o	84, 48	-
(Yoosuf et al., 17 Feb 2025) (StructTransform)	Claude 3.5 Sonnet, GPT-4o	>96	SOTA safety alignment
(Schoepf et al., 8 Mar 2025) (MAD-MAX)	GPT-4o, Gemini-Pro	97, 98	TAP, SOTA defenses
(Miao et al., 3 Jul 2025) (VisCo)	GPT-4o (MM-SafetyBench)	85	Multimodal filtering
(Lin et al., 17 Jul 2025) (PSA)	Claude3.5-Sonnet, Deepseek-R1	97, 98	-
(Chen et al., 18 Jul 2025) (TopicAttack)	Llama3-8B-Instruct, GPT-4o	>90	Sandwich, Spotlight

These references profile the ASR landscape, highlighting that many adversarial methods—including those targeting latest-generation LLMs and multimodal models—consistently achieve success rates in the 85–98% range, even when state-of-the-art safeguarding mechanisms are deployed.

4. Domain-Specific Metrics and Advanced Variants

ASR has been specialized in several research contexts:

Word Error Rate (WER) and Character Error Rate (CER): In adversarial ASR, attacks indirectly maximize ASR by increasing WER or CER against the target model. For spectrum-reduction attacks, the ASR effect is quantified by increases in WER/CER (Wang et al., 2023).
Attack Success Rate Difference (ASRD): In NLP backdoor attacks, ASR can be inflated by confounding factors. ASRD,

$\text{ASRD} = |\text{ASR}_{\text{poisoned}} - \text{ASR}_{\text{clean}}|$

isolates the true effect of the injected trigger by subtracting the rate observed on clean models (Shen et al., 2022).

Non-detectable Attack Success Rate (NASR): For adversarial attacks that must evade OOD detection measures (e.g., Mahalanobis distance, prediction confidence). NASR discounts attacks that are detectable, evaluating only successes that are both effective and undetectable (Wang et al., 2023).
First Attack Success Rate (FASR): In jailbreaking LLMs (e.g., implicit reference/CAIR), FASR quantifies the “first-try” attack efficacy—a measure of real-world practicality (Wu et al., 4 Oct 2024).

5. Implications for Security, Robustness, and Future Defense Research

The consistently high ASR observed across domains has several implications:

Efficacy of Adversarial and Backdoor Attacks: High ASR, especially after advanced defensive training, underscores the persistent vulnerability of ASR, LLM, and vision systems. Successful attacks often exploit weak model inductive biases, transferability across model classes, or the inability of alignment mechanisms to generalize to novel syntactic or contextually structured adversarial prompts (Yoosuf et al., 17 Feb 2025, Lin et al., 17 Jul 2025).
Stealth and Detection Avoidance: As attacks incorporate adaptation to psychoacoustic constraints, distribution preservation (DA³), or camouflage/unlearning strategies, high ASR is obtainable without raising standard anomaly scores, making detection significantly more challenging (Wang et al., 2023, Alam et al., 17 Feb 2025).
Benchmarking and Red Team Automation: Tools like MAD-MAX automate the discovery of attack variants, maximizing ASR economically and extensibly. The modular approach and diversity injection lead to near-perfect ASR (97–98%) even for the latest multimodal LLMs, far exceeding traditional red-team baselines (Schoepf et al., 8 Mar 2025).
Defense Limitations: Existing defenses—audio preprocessing, perplexity filtering, structure pattern matching—have demonstrated only partial effectiveness, rarely reducing ASR more than marginally (Das et al., 2018, Wu et al., 4 Oct 2024).
Model Alignment and Vulnerability Bias: The PSA paper observes that LLMs show alignment/vulnerability bias: models respond differently if exploitative prompts are presented in the context of defense- vs. attack-focused academic papers, indicating that ASR may be model- and context-sensitive beyond the prompt’s immediate semantics (Lin et al., 17 Jul 2025).

6. Directions for Evaluation and Methodological Transparency

ASR measurement is now recognized as necessary but insufficient for responsible adversarial evaluation:

Fine-grained ASR accounting: Differentiation of raw ASR vs. ASRD/NASR is essential to avoid overestimating true backdoor or adversarial power.
Explicit Evaluation Protocols: Standardized benchmarks, multi-metric reporting (WER, ASR, NASR, FASR), and cross-model assessments (transferability, context robustness) are required for rigorous defense validation.
Context-sensitive Analysis: ASR should be interpreted in conjunction with underlying model biases, prompt sources, and defense configuration.

Ultimately, high ASR across modalities signals not only critical vulnerabilities in deployed AI systems but also the crucial need for principled, context-aware, and adaptive defense strategies. As attack methods diversify (spanning transfer learning, context construction, structure transformation, and feature disentanglement), ASR will remain the critical metric for reliably benchmarking system robustness and adversarial progress.