Papers
Topics
Authors
Recent
Search
2000 character limit reached

Decision-Based Black-Box Adversarial Attacks

Updated 15 April 2026
  • Decision-based black-box adversarial attacks are defined by using only discrete output labels to identify minimal perturbations for misclassification.
  • Techniques such as boundary, evolutionary, and RL-based methods enable efficient exploration of adversarial directions under strict query constraints.
  • Empirical results demonstrate high success rates and sparsity improvements, questioning the presumed robustness of output-restricted models.

Decision-based black-box adversarial attacks are a class of adversarial machine learning attacks in which the adversary is limited to querying a model and observing only the model’s final output (typically, the top-1 predicted label), without access to logits, confidence scores, or model internals. These attacks pose significant threats to deployed systems such as ML-as-a-service APIs, vision, trajectory prediction, and text models, where exposing only a discrete decision is often assumed to improve robustness. Contrary to this assumption, decision-based attacks have demonstrated the ability to efficiently craft imperceptible or highly sparse adversarial perturbations under strict information constraints, raising serious concerns for real-world AI security.

1. Formal Problem Definition and Threat Model

A decision-based black-box adversarial attack operates under a strictly limited information regime: the adversary can query an input xx' and observe only the top-$1$ label f(x){1,,K}f(x') \in \{1,\ldots,K\} of a classifier ff, with no access to scores, probabilities, or gradients. The canonical task is, given a clean input xx with true label y=f(x)y=f(x), to find a minimally perturbed xx' such that f(x)yf(x')\neq y (untargeted) or f(x)=y~f(x')=\tilde{y} (targeted), typically under a norm constraint (e.g., 0\ell_0, $1$0, or $1$1):

$1$2

or in the targeted case $1$3, where $1$4 is the norm of interest.

This setting is strictly harder than white-box or score-based black-box attacks, as the adversary cannot exploit gradient information or continuous output to efficiently find adversarial directions. Instead, the search is combinatorial and often NP-hard (especially for $1$5 sparse attacks), requiring black-box optimization or heuristic search strategies (Vo et al., 2022).

2. Core Methodologies and Algorithmic Strategies

2.1 Boundary Attack and Sampling-based Methods

The prototypical decision-based attack is the Boundary Attack (Brendel et al., 2017): it starts from a large adversarial perturbation, then performs a random walk along the decision boundary, iteratively reducing the perturbation while keeping the input misclassified. At each step:

  • An orthogonal step explores new directions along the boundary.
  • A small step is taken toward the clean input to minimize distance.
  • Proposals that cross back to the original class are rejected, keeping the trajectory on the adversarial side.

Algorithmic proposals are drawn from i.i.d. normal or domain-informed distributions (see biased sampling below). The attack dynamically adapts step sizes to maintain efficiency near the boundary.

For discrete or combinatorial constraints (notably sparse $1$6 attacks), evolutionary algorithms have been applied. For example, SpaEvoAtt (Vo et al., 2022) models a candidate perturbation as a binary mask over pixel locations and uses binary differential recombination, targeted mutation, and a fitness function that incorporates attack success and perturbation size. This approach transforms the NP-hard $1$7 search into a tractable binary vector optimization, yielding state-of-the-art sparsity and query efficiency on both CNNs and Vision Transformers.

Similar population-based heuristics have been adopted in other domains, including face recognition (Dong et al., 2019) and NLP, where attacks proceed via synonym substitution guided by semantic similarity and genetic optimization (Maheshwary et al., 2020).

2.3 Biased Sampling and Priors

Query efficiency can be significantly increased by injecting structured priors into the proposal distribution (Brunner et al., 2018):

  • Image-frequency priori (e.g., Perlin noise): Focuses proposals on low-frequency, “image-like” directions, improving naturalness and transferability.
  • Regional masking: Concentrates perturbations in spatial regions where source and adversarial images differ most.
  • Surrogate-gradient prior: Leverages gradients from a surrogate model to guide sampling, even in the label-only regime.

These biased sampling techniques can reduce query complexity by up to $1$8–$1$9, and, when combined, outperform vanilla random-walk methods by a wide margin (Brunner et al., 2018).

2.4 Gradient Estimation and Distributional Approaches

Recent advances employ zeroth-order or distributional optimization, often reframing attack generation as policy learning:

DBA-GP (Liu et al., 2023) demonstrates that leveraging data-dependent priors (bilateral smoothing for edge preservation) and time-dependent priors (re-using correlated historical gradient estimates) can further accelerate convergence, particularly near the decision boundary.

2.5 Specialized and Automated Strategies

Automated program synthesis approaches (AutoDA (Fu et al., 2021)) search a space of low-level geometric vector operations to discover update rules, showing that even simple linear combinations of projected Gaussian noise and boundary-normal directions can recover or outperform expert-crafted strategies. Other domain extensions include attacks on trajectory prediction (Li et al., 27 Mar 2026) and semantic segmentation using proxy-guided, discrete structured perturbation search (Chen et al., 2024).

3. Empirical Evaluation and Query Efficiency

Decision-based attacks are empirically evaluated on curriculum from standard vision (CIFAR-10, ImageNet), face recognition (LFW, MegaFace), trajectory prediction, NLP, and segmentation targets. The primary metrics are:

  • Attack Success Rate (ASR): Fraction of adversarial examples that cause misclassification.
  • Query Budget: Number of queries needed to reach a target distortion or ASR.
  • Perturbation Magnitude: Measured by f(x){1,,K}f(x') \in \{1,\ldots,K\}0, f(x){1,,K}f(x') \in \{1,\ldots,K\}1, or f(x){1,,K}f(x') \in \{1,\ldots,K\}2 norm; sparsity is especially notable for SpaEvoAtt, which achieves f(x){1,,K}f(x') \in \{1,\ldots,K\}3 median f(x){1,,K}f(x') \in \{1,\ldots,K\}4 on ImageNet (i.e., f(x){1,,K}f(x') \in \{1,\ldots,K\}5 pixels perturbed in f(x){1,,K}f(x') \in \{1,\ldots,K\}6 images) within f(x){1,,K}f(x') \in \{1,\ldots,K\}7 queries (Vo et al., 2022).
  • Runtime: Computation per query is typically negligible relative to model evaluation.

Table: Empirical comparison for SpaEvoAtt vs. Pointwise (Vo et al., 2022):

Setting Attack Median Sparsity ASR Queries
Untargeted/ImageNet Pointwise 0.0012 77% @ 0.001 5,000
Untargeted/ImageNet SpaEvoAtt 0.0008 99% @ 0.001 5,000
Targeted/ImageNet Pointwise 1.0 (fail) -- 20,000
Targeted/ImageNet SpaEvoAtt 0.0076 99% @ 0.01 20,000

Population and mutation-rate ablations show that SpaEvoAtt is robust to parameter choices, with f(x){1,,K}f(x') \in \{1,\ldots,K\}8 and f(x){1,,K}f(x') \in \{1,\ldots,K\}9 (ImageNet) yielding optimal convergence.

Other studies compare against white-box baselines (e.g., PGD, C&W) and hard-label optimization methods (Brunner et al., 2018, Hogan et al., 2018), finding that decision-based attacks can approach or sometimes surpass white-box performance under similar query budgets.

Query efficiency is highly sensitive to the attack algorithm, the use of domain priors, and the presence of system-level invariances (e.g., input preprocessing), with unaware attacks suffering ff0 degraded efficiency (Sitawarin et al., 2022).

4. Domain Extensions and Vulnerabilities

4.1 Structured Output and Specialized Domains

Decision-based attacks have been adapted to domains beyond standard image classification:

  • Vision Transformers: Patch-wise Adversarial Removal (PAR (Shi et al., 2021)) exploits non-overlapping patch structure in ViTs to compress adversarial noise more efficiently, achieving lower median ff1 at fixed budget, especially when used as initialization for other attacks.
  • Trajectory Prediction: DTP-Attack (Li et al., 27 Mar 2026) applies a boundary-walking algorithm to the trajectory space, perturbing historical agent positions to cause intention misclassification or trajectory deviation, achieving ff2 ASR with sub-meter perturbations.
  • Semantic Segmentation: Discrete Linear Attack (DLA (Chen et al., 2024)) attacks per-pixel decision maps, using proxy-guided search over discrete, structured noise patterns to dramatically reduce mIoU (e.g., ff3 on PSPNet/Cityscapes in ff4 queries).
  • Natural Language Processing: Hard-label black-box attacks combine synonym substitution, search space reduction, and genetic optimization to produce adversarial texts with minimal semantic drift and high success rates (>90%) (Maheshwary et al., 2020).

Universal (image-agnostic) perturbations have also been demonstrated in this regime (Hogan et al., 2018, Yu et al., 2023), raising the risk of attacks that generalize across samples and systems.

4.2 System-level Obstacles

Preprocessing modules preceding the classifier (e.g., cropping, resizing, quantizing) introduce invariances that can degrade the effectiveness of decision-based attacks by several fold if unaccounted for (Sitawarin et al., 2022). Preprocessor-aware strategies can recover full efficacy, emphasizing the need for adversaries to model the entire computation pipeline.

Cost-sensitive threat models, such as in content moderation or malware detection, penalize “flagged” queries more heavily. Stealthy attack variants minimize such flagged queries by leveraging egg-dropping line search and early stopping, trading more non-flagged queries for reduced detection risk (Debenedetti et al., 2023).

5. Limitations, Defensive Implications, and Open Questions

Despite significant progress, decision-based black-box attacks remain constrained by:

  • Query Budget: Crafting imperceptible or highly sparse adversarials can require up to ff5–ff6 queries for difficult targets or robust models.
  • Optimization Hardness: Many objectives (e.g., ff7 minimization under hard-label constraints) are NP-hard even in the white-box setting (Vo et al., 2022).
  • System Invariances: Unmodeled preprocessing can waste queries and obscure gradients, substantially reducing attack efficiency unless reversed (Sitawarin et al., 2022).
  • Robustness to Defenses: Defenses relying solely on output discretization are inadequate; adaptive, stateful, or randomized strategies may improve resilience but ultimately can be circumvented by adaptive adversarial policies (Tsingenopoulos et al., 2023).

Viable defense approaches now include:

  • Certified ff8-robust architectures, input filtering, and query-rate limiting, each with their own cost and coverage tradeoffs.
  • Augmenting adversarial training with decision-based or universal perturbations can partially restore accuracy (Maheshwary et al., 2020).

Emerging research areas concern:

6. Summary Table: Core Advances and Empirical Benchmarks

Class of Attack Key Reference Principal Mechanism Empirical Gains
Boundary Attack (Brendel et al., 2017) Random-walk, orthogonal steps Scalable to ImageNet
Biased Sampling (Brunner et al., 2018) Perlin, mask, surrogate priors ff9–xx0 query speedup
Evolutionary (Sparse, Faces, NLP) (Vo et al., 2022, Dong et al., 2019, Maheshwary et al., 2020) Binary vector, CMA-ES, GA 5–20xx1 fewer queries
Automated Search (AutoDA) (Fu et al., 2021) Program synthesis Best-in-class under budget
RL-based Distributional (DBAR) (Huang et al., 2022) Policy gradient PPO High ASR, transferability
Universal Perturbations (Hogan et al., 2018, Yu et al., 2023) Zeroth-order (RGF, SPSA) 90%+ fooling rates; 10⁶ queries
Patch-wise/Proxy-guided (ViT, Seg) (Shi et al., 2021, Chen et al., 2024) Patch removal, mIoU proxy Order-of-mag. noise reduction
Preprocessor-aware (Sitawarin et al., 2022) Reverse-engineer preprocessing xx2–xx3 speedup
Stealthy/Cost-sensitive (Debenedetti et al., 2023) Minimize flagged queries xx4–xx5 fewer flags

Each of these works is closely tailored to the technical aspects of decision-based black-box optimization and demonstrates that, even in the strictest information setting, deep models can be exploited with practical query budgets.

7. Broader Security and Research Implications

Decision-based black-box adversarial attacks fundamentally challenge the assumption that restricting model outputs to discrete decisions is a sufficient defense. Even when only label information is exposed, adversaries can efficiently craft imperceptible or highly sparse adversarial inputs, and these attacks extend to complex, structured-output domains and sequential models.

This realization exposes vulnerabilities in a wide range of deployed machine learning systems and shifts the security paradigm toward:

  • Comprehensive modeling of system-level invariances and preprocessing.
  • Defensive strategies that incorporate adaptive (possibly RL-based) responses.
  • Theoretical characterization of query complexity and worst-case evaluation under adaptive threat models.
  • Practical evaluation frameworks that reflect realistic cost models (including flagged queries and system responses).

Open questions remain regarding optimally query-efficient heuristics in the decision-only regime, extension to further domains, formal transferability characterizations, robust and certified defense mechanisms, and the co-evolution (“arms-race”) of attack and defense policies in adversarial settings (Tsingenopoulos et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Decision-Based Black-Box Adversarial Attacks.