Black-Box Adversarial Attack Research
- Black-Box Adversarial Attack is a method that creates subtle input perturbations to mislead deep neural networks using only output queries.
- It employs optimization techniques such as NES, CMA-ES, and reinforcement learning to achieve high query efficiency and imperceptibility.
- The approach is applied across image, video, and time-series tasks, highlighting challenges in transferability and robustness under limited model information.
A black-box adversarial attack is a technique for generating input perturbations that induce misclassification in machine learning models, particularly deep neural networks, under limited knowledge and restricted access. In the black-box setting, the adversary lacks access to the model’s parameters and gradients and is often constrained to querying the model with inputs and observing only output probabilities, logits, or even just class labels. Black-box adversarial attacks are of central relevance in practical security scenarios, such as cloud-hosted vision APIs or deployed devices, where “white-box” (full access) assumptions are untenable. Recent years have seen the development of diverse black-box attack methodologies optimizing for query efficiency, perturbation imperceptibility, and transferability across models (Qiu et al., 2021, Costa et al., 1 Oct 2025, Wang et al., 24 Jan 2025, Djilani et al., 2024, Moon et al., 2019, Oe et al., 29 Nov 2025, Williams et al., 2022, Pomponi et al., 2022, Zhou et al., 2020, Xia et al., 2022).
1. Problem Formulation and Black-Box Threat Models
The black-box adversarial scenario formalizes the adversary’s goal as finding a perturbation such that, for a given model , clean input , and label , the model is made to misclassify , subject to a norm constraint . The adversary interacts with through queries:
- Score-based: Receives logits or class probabilities for queried inputs .
- Decision-based (hard-label): Receives only the class label .
Formally, attacks seek to solve:
or, for targeted attacks:
Each evaluation counts as a black-box query; gradients are inaccessible (Qiu et al., 2021, Djilani et al., 2024).
Two primary black-box attack paradigms are established:
- Transfer-based: Perturbations are crafted on surrogate models and transferred to the target without adaptation.
- Query-based: Optimization is performed by iteratively querying the target model, using only observed outputs.
This framework extends across tasks such as image classification (Qiu et al., 2021), video recognition (Jiang et al., 2019), and time series analysis (Huang et al., 2022).
2. Core Attack Methodologies
2.1 Zeroth-Order Optimization and Evolution Strategies
Derivative-free optimization is central to query-based attacks. Evolution Strategies (ES) such as (1+1)-ES, NES (Natural Evolution Strategies), and CMA-ES are canonical:
- (1+1)-ES: Maintains a single candidate , samples mutations, and applies step-size control via the 1/5 success rule.
- NES: Estimates the gradient of expected “fitness” under a search distribution by sampling perturbations and weighting by their fitness scores; supports population-based updates.
- CMA-ES: Adapts a full covariance matrix, using population information to learn anisotropic search directions, achieving strong performance with high-dimensional, nonconvex objectives (Qiu et al., 2021).
Other effective gradient-free algorithms include combinatorial local search over the hypercube (Moon et al., 2019), Bayesian optimization in low-dimensional subspaces (Shukla et al., 2019), and projection-, probability-driven random walks in DCT space (Li et al., 2020).
2.2 Greedy and Structured Search
- GreedyPixel: Performs fine-grained, per-pixel brute-force search guided by a surrogate-based priority map. Queries are concentrated on prioritized pixels to minimize adversarial loss rapidly (Wang et al., 24 Jan 2025).
- Superpixel Attack: Segments the image into compact superpixels (SLIC), flipping signs of perturbations over regions iteratively, using a versatile boundary search for efficient high-dimensional exploration (Oe et al., 29 Nov 2025).
- Pixle: Executes -bounded attacks by rearranging a small number of existing pixels, exploiting neural networks’ spatial sensitivity with fewer than 200 queries for near-100% success on multiple datasets (Pomponi et al., 2022).
2.3 Transfer-Based and Data-Free Methods
Transfer-based attacks generate perturbations using a surrogate model; efficacy is governed by the “robustness alignment” between surrogate and target loss landscapes (Djilani et al., 2024). Data-free approaches enable attacks in the absence of training data, relying on universal perturbations generated without sample access (Huan et al., 2020).
2.4 Zero-Query Transfer Attacks
Recent developments (e.g., ZQBA) require zero queries during the attack phase. Rather than adapting perturbations online, ZQBA extracts guided-backpropagation feature maps offline from a surrogate, storing these as “perturbation templates.” These are scaled and added to new images, achieving significant cross-architecture and cross-domain transferability with high perceptual fidelity (SSIM ) (Costa et al., 1 Oct 2025).
2.5 Decision-Based and Reinforcement-Learning Attacks
Hard-label black-box settings—where only predicted classes are revealed—demand specialized search, often using surrogate guidance augmented with randomized, zeroth-order exploration (e.g., SQBA (Park et al., 2024)). Reinforcement learning approaches (e.g., DBAR (Huang et al., 2022)) model the attack as a policy search, training an actor-critic to output distributions over perturbations optimized for maximal attack success and transferability under norm constraints.
2.6 Latent-space and Unrestricted Attacks
Some black-box methods (e.g., Latent-HSJA) move the search into a generative manifold (GAN latent space), enabling highly effective unrestricted attacks under decision-based restrictions (hard-label queries only). Traversing the boundary in GAN space yields perceptually coherent but adversarial examples with practical query efficiency (Na et al., 2022).
2.7 Embedding-Based Optimization
Embedding methods (e.g., TREMBA (Huang et al., 2019)) perform black-box search within a low-dimensional adversarial embedding, constructed via an encoder-decoder trained on a source model. NES in this space dramatically reduces number of queries required to attain high attack success rates, especially in targeted and robust-model scenarios.
3. Comparative Performance and Query Efficiency
Empirical studies demonstrate that query efficiency, success rate, and stealth of attacks vary substantially by method. Key results include:
| Attack | Success (untargeted, ImageNet ε=0.05) | Mean Queries (success only) | Notes |
|---|---|---|---|
| CMA-ES | 99.9% (ResNet-50) | Few 100s | Robust under small ε, targeted |
| NES | 89.8% | 800–1500 | Degrades under small ε |
| Parsimonious | 98.5% | 722 | Combinatorial, hyperparam-light |
| GreedyPixel | 84.7% (224×224, ε=4/255) | 6662 (= Q/max=20,000) | High imperceptibility, ASR=100% on CIFAR-10 (Wang et al., 24 Jan 2025) |
| Superpixel Attack | +2.10% over next best (T=1000) | – | Compact region search (Oe et al., 29 Nov 2025) |
| ZQBA (Zero-query) | Reduces accuracy of target by >20% | 0 | Transferability; high SSIM (Costa et al., 1 Oct 2025) |
| SQBA | 5× higher ASR than HSJA at ≤ 250 queries | ~40–90 (CIFAR-10, ρ=0.1) | Hard-label, surrogate-guided (Park et al., 2024) |
| DBAR (RL, label only) | ASR=1.0 (CIFAR-10) | <20,000 | High transferability (Huang et al., 2022) |
Only population-based ES with covariance adaptation (CMA-ES), combinatorial local search, and advanced RL methods achieve near-perfect untargeted success with tight query budgets, especially in difficult regimes (e.g., small , robust targets, or label-only access) (Qiu et al., 2021, Wang et al., 24 Jan 2025, Oe et al., 29 Nov 2025, Huang et al., 2022).
4. Transferability, Robustness, and Defensive Challenges
Adversarial transferability is intimately linked to alignment between the surrogate's and the target's adversarial landscapes. Robust surrogates only improve transfer against robust targets; vanilla surrogates maximize transfer on vanilla targets. Even simple adversarial training (e.g., Madry-style PGD) can reduce advanced black-box attack success rates on ImageNet models from ~90% to <4%, regardless of query or transfer-based approach. Defenses that combine robust and vanilla models may leak transferability and thus remain vulnerable to targeted attacks (Djilani et al., 2024).
Advanced defenses (AutoAttack-optimized models, pretraining/data augmentation, ensemble policies) reduce black-box attack success to the low single digits, emphasizing the importance for both attack and defense research to address robust models, not just standard architectures.
5. Extensions: Videos, Non-Additive Attacks, and Smoothing-Based Approaches
Black-box attacks have been extended to high-dimensional domains such as video (V-BAD) by exploiting transferable perturbations from image models and partition-based NES rectification. Efficient targeted attacks in the video domain achieve >93% success at comparable query budgets to static images, despite input dimensionality (Jiang et al., 2019).
Non-additive, -bounded permutation-based (Pixle) or visually-motivated (Art-Attack) approaches demonstrate that spatial rearrangement or local, structured perturbations can be more query-efficient and visually subtle than classical additive noise (Pomponi et al., 2022, Williams et al., 2022). Texture-smoothing methods such as AdvSmo blur the linear texture components via Gabor filters, exploiting CNNs’ texture bias and showing exceptional transferability and resistance to standard image-space pre-processing defenses (Xia et al., 2022).
6. Limitations, Pitfalls, and Future Directions
Black-box adversarial research must confront several persistent challenges:
- Scalability: Query cost scales poorly with input dimensionality; even advanced methods can exhaust budgets on high-resolution or temporal data.
- Transferability vs. Adaptation: Zero-query or transfer attacks degrade rapidly with input domain or architectural mismatch; adaptive attacks require careful surrogate selection and/or dynamic adjustment (Costa et al., 1 Oct 2025, Djilani et al., 2024).
- Robustness of Defenses: Modern robust architectures generalize their white-box resilience to black-box settings, severely limiting attack efficacy (Djilani et al., 2024).
- Low-Resource Regimes: Hard-label and extremely low-query scenarios (e.g., <100 queries) require hybrid architectures or surrogates to achieve nontrivial success (Park et al., 2024, Shukla et al., 2020).
- Interpretability and Perceptual Metrics: Maintaining imperceptibility (e.g., SSIM, LPIPS constraints) while maximizing transfer or attack success remains an unresolved tension.
- Real-World Constraints: Defenses based on input transformations, randomization, and query monitoring can mitigate some black-box attack vectors but are not comprehensive.
Active research continues on adaptive block-sizing, region-aware optimization, robust prior selection, RL-based search policies, and defense strategies against generative or non-additive attacks.
7. Key References and Research Groups
Black-box adversarial attack research is highly active, with major contributions from:
- Population-based and gradient-free optimization: “Black-box adversarial attacks using Evolution Strategies” (Qiu et al., 2021), “Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization” (Moon et al., 2019).
- Surrogate and transfer attacks: “RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses” (Djilani et al., 2024), “TREMBA: Transferable Representation-based Black-box Attack” (Huang et al., 2019).
- Structured, sparse, and local attacks: “Superpixel Attack” (Oe et al., 29 Nov 2025), “GreedyPixel” (Wang et al., 24 Jan 2025), “Pixle” (Pomponi et al., 2022).
- Decision-based and hard-label attacks: “Hard-label based Small Query Black-box Adversarial Attack” (Park et al., 2024), “Universal Distributional Decision-based Black-box Adversarial Attack with Reinforcement Learning” (Huang et al., 2022).
- Zero-query, data-free, and universal attacks: “ZQBA: Zero Query Black-box Adversarial Attack” (Costa et al., 1 Oct 2025), “Data-Free Adversarial Perturbations for Practical Black-Box Attack” (Huan et al., 2020).
The continual evolution of black-box methodologies is pushing both offense and defense toward higher levels of sophistication and realism. The interplay between query efficiency, transferability, robust target models, and imperceptibility sets the frontier of current research.