Papers
Topics
Authors
Recent
Search
2000 character limit reached

NASR: Metric for Non-detectable Attacks

Updated 1 February 2026
  • NASR is a metric that evaluates both the attack success and the stealth by requiring that adversarial or backdoor attacks remain undetected by human or algorithmic defenses.
  • Empirical studies in vision and NLP show NASR’s dependence on poisoning rates and illustrate trade-offs between raw attack success and detection evasion.
  • NASR advances security analysis by guiding detection-aware attack design and optimizing methods for stealth in practical, real-world settings.

Non-detectable Attack Success Rate (NASR) is a metric designed to quantify the practical efficacy of adversarial and backdoor attacks under the presence of detection or filtering defenses. Unlike the standard Attack Success Rate (ASR), which measures only whether an attack accomplishes its objective (e.g., misclassification or targeted prediction), NASR further requires that successful attacks evade detection by human observation or algorithmic defenses. The concept has been independently formalized in computer vision and natural language processing as a key measure of real-world attack stealth and utility across settings such as neural network backdoors (Xia et al., 2022, Wang et al., 2021, Zhou et al., 2023) and adversarial NLP (Wang et al., 2023).

1. Formal Definition and Distinction from ASR

The Non-detectable Attack Success Rate is defined as the fraction of test instances for which the attack achieves its target and the perturbed (or poisoned) input remains undetected by a specified detection or filtering mechanism. Let xx be an original input, xx' or xadvx_{\text{adv}} the attacked/poisoned input, yy the true label, yty_t the attacker's chosen target, and C()C(\cdot) or fvictim()f_{\text{victim}}(\cdot) the model prediction. The classic ASR is

ASR=1NxX1{C(x)=yt}\mathrm{ASR} = \frac{1}{N} \sum_{x \in X} \mathbf{1}\{C(x') = y_t\}

NASR augments this with a non-detectability constraint. In practice, this manifests in two principal forms:

  • Label-consistency (image backdoors): Require that each xx' remains visually (or instrumentally) indistinguishable from a clean image, as verified by human or perceptual model (Xia et al., 2022, Zhou et al., 2023).
  • Defense-aware (general, including NLP): Require that xx' passes undetected by an automated detector dk(x)d_k(x') (e.g., OOD score, backdoor anomaly index), so that the attack is not flagged (Wang et al., 2021, Wang et al., 2023).

In mathematical terms, for detection rate DR\mathrm{DR} (probability xx' is flagged), the most common join is

NASR=ASR×(1DR)\mathrm{NASR} = \mathrm{ASR} \times (1 - \mathrm{DR})

or, in the presence of individual test-wise detection,

NASRk=1NxX1[C(x)=yt    dk(x)=0]\mathrm{NASR}_k = \frac{1}{N} \sum_{x \in X} \mathbf{1}\left[ C(x') = y_t \;\wedge\; d_k(x') = 0 \right]

where kk indexes the detector and dk(x)=1d_k(x')=1 if detected.

NASR thus captures the actual risk posed by stealthy attacks in a deployment where observable or detectable attacks would be suppressed.

2. Methodologies for Measuring NASR

The computation of NASR necessitates an attack generation protocol, a set of detection criteria, and an evaluation routine. The precise methodology varies by domain:

  • Backdoor Image Attacks (Xia et al., 2022, Wang et al., 2021, Zhou et al., 2023):
    • Generate poisoned training samples xx' under a low poisoning rate ρ=Dpoisoned/Dtrain\rho = |D_{\text{poisoned}}| / |D_{\text{train}}|, using methods such as feature-space transformation, blind watermarking, frequency-domain perturbation, or spatial-attention triggers.
    • Retrain the model and evaluate on a held-out clean test set with triggers injected.
    • For each test sample, report a “success” only if the triggered xx' is (a) mapped to yty_t and (b) is not visually flagged or in some works, not signaled by explicit defense methods (e.g., Neural Cleanse anomaly index, AEVA thresholds, STRIP, ABS).
    • Use quantifiable perceptual metrics (PSNR, SSIM, LPIPS, MSE) to support or automate the invisibility constraint (Zhou et al., 2023, Wang et al., 2021).
  • Adversarial NLP Attacks (Wang et al., 2023):
    • Generate xadv=Attack(x)x_{\text{adv}} = \operatorname{Attack}(x) for each test xx.
    • Evaluate model prediction yadv=fvictim(xadv)y_{\text{adv}} = f_{\text{victim}}(x_{\text{adv}}).
    • Employ OOD detectors (e.g., Maximum Softmax Probability (MSP), Mahalanobis Distance (MD)), flagging xadvx_{\text{adv}} as detected if it exceeds a set threshold calibrated on the training distribution.
    • NASR is the proportion of attacked texts where the label is successfully flipped and the detector does not fire.

Algorithmically, this involves iterating over the test set, counting the number of attack successes that also evade detection, and normalizing by NN.

3. Empirical Behavior and Functional Dependence

NASR is sensitive to both the poisoning/perturbation rate and the attack design:

  • Image Backdoors (Xia et al., 2022):
    • Empirically, NASR follows a sigmoidal (logistic) dependence on poisoning rate ρ\rho, rising sharply as ρ\rho increases from very low values, with the specific growth rate depending on dataset and attack method.
    • For the FRIB attack, MNIST achieves NASR \approx 1.0 at ρ=1.16%\rho = 1.16\%, whereas CIFAR10 requires higher rates to approach similar NASR.
    • Compared to baseline attacks (BadNets, Hidden), FRIB achieves a given NASR at far lower ρ\rho, as summarized in the table below:
Dataset Attack ρ\rho (%) NASR
MNIST FRIB 1.16 1.0
MNIST Hidden 1.16 0.93
MNIST BadNets 1.16 0.89
CIFAR10 FRIB 5.66 0.92
CIFAR10 Hidden 5.66 0.88
CIFAR10 BadNets 5.66 0.91
  • Frequency-domain Backdoors (Wang et al., 2021):
    • For FTROJAN, NASR remains near ASR (NASR \approx ASR \gg 0.95), as frequency triggers are not flagged by most pixel-based anomaly detectors.
    • In contrast, pixel-space triggers achieve high ASR but are always detected (DR \approx 1), so NASR \approx 0.
  • NLP Attacks (Wang et al., 2023):
    • NASR provides a clear evaluation of the combined strength and stealth of attacks. For instance, on SST-2, TextFooler achieves ASR 95.2%, but NASR drops to 53.5% (MSP) due to detection.
    • The proposed DALA method provides a higher NASR—e.g., on MRPC, DALA achieves NASRMSP=74.9%_{\text{MSP}}=74.9\% and NASRMD=93.3%_{\text{MD}}=93.3\%.

4. Relationship to Attack Design and Stealth

NASR is maximized when attacks are optimized for both effectiveness and stealth. Key observations across attack variants:

  • Feature Repair and Watermarking (Xia et al., 2022): Embedding triggers in low-frequency DWT coefficients (blind watermarking) repairs loss of trigger features common in feature-space attacks, yielding high NASR at low ρ\rho.
  • Frequency-Domain Attacks (Wang et al., 2021): Dispersed, low-amplitude frequency triggers elude pixel-based and naive statistical detectors, maximizing NASR.
  • Spatial Attention (SATBA) (Zhou et al., 2023): Spatial-attention-driven, U-net-embedded sample-specific triggers produce poisoned samples with high image similarity (PSNR >> 36 dB, LPIPS << 0.01), passing both human and automated defense thresholds (Neural Cleanse MAD << 2, AEVA << 4). These indicators are directly linked to non-detectable attack criteria.
  • Distribution-aligned Adversarial NLP (Wang et al., 2023): Standard adversarial methods (e.g., TextFooler) typically produce detectable distribution shift, sharply lowering NASR. Distribution-aware training (e.g., DALA) trades off a slight drop in ASR for much larger gains in NASR by aligning perturbed examples to the in-distribution manifold.

A plausible implication is that attack methods designed with explicit consideration of detector mechanisms (either via feature repair, domain-specific perturbation, or distributional alignment) can substantially increase NASR over naive approaches, underscoring NASR’s value as an end-to-end security metric.

5. Comparative Results and Benchmarks

Empirical results validate NASR’s discriminatory utility. Select highlights:

  • Image domain (Xia et al., 2022, Wang et al., 2021, Zhou et al., 2023):
    • FRIB, FTROJAN, and SATBA all achieve high NASR (\geq90%) at lower poisoning rates than older pixel- or feature-space-only methods.
    • Baseline attacks can have high ASR but near-zero NASR if triggers are obvious to detection frameworks.
    • SATBA achieves ASR 1\approx 1 and eludes both Neural Cleanse and AEVA, with high imperceptibility (e.g., MNIST: MSE=1.47, PSNR=47.3dB, LPIPS=0.0001).
  • NLP domain (Wang et al., 2023):
    • DALA achieves both high ASR and the highest NASR across datasets and detectors, narrowing the ASR–NASR gap compared to TextFooler or BERT-Attack.

Tables in the referenced works report both ASR and NASR for a variety of models, attack methods, datasets, and detectors, demonstrating that NASR can diverge greatly from ASR depending on attack stealth.

6. Limitations and Considerations

  • Detector dependence: NASR is computed relative to specific detection methods and threshold choices (e.g., MSP, Mahalanobis, Neural Cleanse, AEVA), which vary in sensitivity and power (Wang et al., 2023). NASR may be higher for an attack when detectors are weak or miscalibrated.
  • Binary vs. continuous notion of detectability: In most works, detection is treated as a binary outcome (flagged or not). Future efforts may incorporate continuous detection scores.
  • Domain specificity: In vision, human visual indistinguishability is sometimes substituted for, or augmented by, statistical anomaly indices; in NLP, OOD detection primarily governs NASR.
  • Attack trade-offs: Methods tuned for maximal NASR may incur slight decreases in raw ASR as they reduce perturbation magnitude or avoid high-salience distribution shifts.
  • Weighting and operational risk: Current NASR is typically uniform over all test examples; real-world evaluation may require risk-weighted variants for security-critical applications (Wang et al., 2023).

7. Significance and Future Directions

NASR has emerged as a critical metric for principled evaluation of adversarial and backdoor attacks in both image and language domains. By combining success and stealth, it reveals gaps in conventional ASR-centric measures and exposes the operationally meaningful risk posed by sophisticated, non-detectable threats. Ongoing research seeks better alignment of NASR with practical security objectives, integration with improved or ensemble detection methods, continuous/differentiable stealth losses for training, and adaptation to new modalities and defense frameworks (Xia et al., 2022, Wang et al., 2021, Zhou et al., 2023, Wang et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Non-detectable Attack Success Rate (NASR).