Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sparse Adversarial Attacks

Updated 10 April 2026
  • Sparse adversarial attacks are defined by altering a minimal set of input features using ℓ0 norm optimization to induce misclassification.
  • They employ techniques like iterative projections, differentiable sparsity surrogates, and structured/group sparsity to craft subtle yet potent perturbations.
  • These attacks reveal critical weaknesses in diverse domains—including image classification, object detection, speech recognition, and reinforcement learning—challenging conventional defense methods.

A sparse adversarial attack is a form of adversarial perturbation that aims to alter as few input components (e.g., pixels, time steps, features) as possible—usually measured by the 0\ell_0 (pseudo‐)norm—in order to induce erroneous predictions from a machine learning model. Unlike dense attacks that spread small perturbations over all input dimensions, sparse attacks concentrate modifications on a minimal subset, while satisfying imperceptibility or problem‐specific constraints. Such attacks highlight a distinct axis of vulnerability in deep neural networks: the possibility that extremely localized or structured changes, even at a vanishing fraction of the input, can have catastrophic effects on model inference, system robustness, and deployment security.

1. Mathematical Formulation and Core Objectives

Sparse adversarial attacks primarily target classifiers or other predictive models by minimizing the 0\ell_0 norm of a perturbation δ\delta, subject to preserving the input within a valid domain (e.g. image pixel range) and achieving a prescribed adversarial goal:

  • Image classification (generic form):

minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d

or, for attacks bounded in magnitude:

minδδ0s.t.f(x+δ)=yadv,  δϵ,x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta)=y_{adv},\;\|\delta\|_\infty\leq\epsilon,\,x+\delta \in [0,1]^d

as in (He et al., 2021, Imtiaz et al., 2022, Zhu et al., 2021, Lin et al., 8 Jun 2025).

Constrained to altering the actions of mnm \ll n agents, or at NTN \ll T timesteps:

minaki,tE[TeamReward]s.t.  #{nontrivial modifications}N\min_{a_{k_i,t}} \mathbb{E}[\text{TeamReward}] \quad \text{s.t.} \; \#\{\text{nontrivial modifications}\} \leq N

as formalized in (Hu et al., 2022).

  • Structured/group sparsity:

Replacing the 0\ell_0 norm with a group-wise surrograte (e.g., nuclear group norms, block quasinorms) to achieve interpretability:

minδL(C(x+δ),t)+λR(δ)\min_\delta \mathcal{L}(\mathcal{C}(x+\delta),\,t) + \lambda\,R(\delta)

where 0\ell_00 counts nonzero groups, via, e.g., 0\ell_01 (Sadiku et al., 2023, Heshmati et al., 18 Oct 2025).

  • Feature- or layer-level sparsity:

Minimal modification in intermediate representations (Che et al., 2019, Kuvshinova et al., 2024).

These optimization programs are inherently combinatorial and nonconvex. In practical methods, differentiable surrogates, homotopy schemes, and end-to-end generator architectures enable efficient or scalable attack construction.

2. Algorithmic Techniques and Model Classes

Sparse adversarial attacks span a diversity of algorithmic paradigms, corresponding to both white-box and black-box access, different domains (images, audio, video, RL), and varying forms of sparsity:

Approach Class Representative Methods Domain(s)
Iterative PGD+0\ell_02 projection 0\ell_03-PGD, PGD-(0\ell_04,0\ell_05) (Croce et al., 2019) image, general
Proximal/homotopy nmAPG w/ 0\ell_06 reg., homotopy (Zhu et al., 2021) image, general
Structured/groupwise GSE (Sadiku et al., 2023), ATOS (Heshmati et al., 18 Oct 2025) image/classification
End-to-end mask/generator AutoAdversary (Li et al., 2022), TSAA (He et al., 2021), STAA-Net (Chang et al., 2024) image, audio
Stochastic greedy VFGA/SSAA (Césaire et al., 2020), CornerSearch (Croce et al., 2019) image/vision
Score-based black-box BruSLeAttack (Vo et al., 2024), Sparse-RS image (black-box)
Frank-Wolfe/conditional gradient SAIF (Imtiaz et al., 2022) image
Feature-space/hidden-layer Sparse feature adversarial (Che et al., 2019), Jacobian-based universal vectors (Kuvshinova et al., 2024) saliency, general
Multi-agent RL control QMIX-based adversarial policies (Hu et al., 2022) RL/MARL
Video & spatial DeepSAVA (Mu et al., 2021) video

Notably, recent advances have prioritized computational scalability, transferability, and the interpretability of the attack patterns via group or structural sparsity surrogates (Heshmati et al., 18 Oct 2025, Sadiku et al., 2023, Lin et al., 8 Jun 2025).

3. Sparse Regularization Mechanisms and Optimization

Fundamental to sparse attack design is the enforcement and control of sparsity. Approaches include:

  • Hard 0\ell_07 constraints/budgets: Direct restriction to 0\ell_08-pixel (or feature) support, often addressed with combinatorial search (Croce et al., 2019, Césaire et al., 2020), or conditional gradient methods (Imtiaz et al., 2022); projection after gradient step ensures exactly 0\ell_09 nonzero positions.
  • Differentiable sparsity surrogates: Smoothed relaxations (e.g., the Overlapping Smoothed δ\delta0 (OSL0) (Heshmati et al., 18 Oct 2025), δ\delta1-quasinorm proximal (Sadiku et al., 2023)), or soft-thresholded masks (Lin et al., 8 Jun 2025, Li et al., 2022). These allow end-to-end training and plug-in optimization for DNN backpropagation.
  • Structured/group sparsity: Imposed via patch-based or semantic grouping in loss or penalty terms (δ\delta2, nuclear group norm, quasinorms) (Sadiku et al., 2023, Heshmati et al., 18 Oct 2025), yielding spatially coherent, interpretable perturbation patterns.
  • Generator-based decoupling: Explicit separation of amplitude and mask, as in TSAA and STAA-Net, permitting high transferability at lower δ\delta3 cost (He et al., 2021, Chang et al., 2024).
  • Homotopy continuation: Gradual reduction of regularization weight to trace a solution path from dense to maximally sparse (Zhu et al., 2021).
  • Bayesian selection: Active learning of mask importance in black-box queries (Vo et al., 2024).

Each mechanism is coupled with imperceptibility constraints (δ\delta4, learned local bounds) to guarantee human-obliviousness or domain compatibility (Zhu et al., 2021, Croce et al., 2019, Imtiaz et al., 2022).

4. Application Domains and Model Vulnerabilities

Sparse adversarial perturbations have demonstrated profound vulnerabilities across a range of architectures and domains.

  • Image classifiers: CNNs, ResNets, Transformers are highly sensitive—even δ\delta5 pixel modifications suffice for δ\delta6–δ\delta7 attack success at full prediction flip (Kuvshinova et al., 2024, Lin et al., 8 Jun 2025, Imtiaz et al., 2022).
  • Object detection: Sparse attacks (e.g., center-line masks) can erase all detections in YOLOv4, Faster R-CNN; the attack’s transferability persists across unseen detection heads (Bao, 2020).
  • Saliency and segmentation: Feature-space attacks at select hidden layers yield even sparser and more indiscernible perturbations, with high transfer to final outputs (Che et al., 2019, Kuvshinova et al., 2024).
  • Speech/audio and time-series: WAV-based attacks (e.g., STAA-Net) perturb δ\delta8 of frames, remain imperceptible (SNR δ\delta9 dB), and achieve minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d0 fooling rates on SER models (Chang et al., 2024).
  • RL/control: Adversarially overriding the actions of a minimal subset of agents (or timesteps) in cMARL can collapse team performance with infrequent deviations (Hu et al., 2022).
  • Video models: DeepSAVA shows that adversarial spatial transformation of a single frame in a long video leads to near-total failure in action recognition (Mu et al., 2021).

A consistent finding is that attack effectiveness is only marginally degraded under additional imperceptibility constraints (e.g., component-wise local variation, bounded SNR, structural similarity in video) (Zhu et al., 2021, Croce et al., 2019, Imtiaz et al., 2022, Mu et al., 2021).

5. Transferability, Black-Box, and Efficiency

A core challenge for sparse attacks has been black-box transferability and query efficiency. Recent methods address this via:

  • Generator-based attacks yielding highly transferable sparse perturbations applicable cross-architecture and input (He et al., 2021, Chang et al., 2024). TSAA achieves minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d1 faster generation than optimization-based sparse attacks, and dramatically higher black-box fooling rates (transfer minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d2 vs. minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d3 for optimization baselines).
  • Universal sparse perturbations (image-agnostic, batch-trained)—constructed via truncated power iteration over model Jacobians—achieve minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d45% pixel support but maintain minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d5 fooling rate across unseen models (Kuvshinova et al., 2024).
  • Bayesian black-box queries: BruSLeAttack leverages a Dirichlet prior over pixel selection, updating mask probabilities based on model score responses, achieving state-of-the-art query efficiency on ImageNet (< 1% pixel changes, minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d6 queries per image, outperforming Sparse-RS and boundary attacks) (Vo et al., 2024).
  • Stochastic/greedy policies: Methods such as VFGA, CornerSearch, and score-based iterative attacks, paired with efficient mask ranking or voting (Césaire et al., 2020, Croce et al., 2019).

Transferability remains highest when the mask-generation or universal vector is trained on diverse data and with loss structures that avoid severe overfitting to a specific substitute model (He et al., 2021, Lin et al., 8 Jun 2025).

6. Interpretability, Explainability, and Structural Analysis

Several frameworks explicitly address the interpretability and semantic grounding of sparse attacks:

Interpretability is quantitatively evaluated via interpretability scores: the mask’s overlap with various saliency and attribution methods, as well as qualitative visualization.

7. Defense Strategies and Open Challenges

Sparse attacks pose fundamentally different challenges than dense minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d7 or minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d8–bounded attacks, rendering many classic defenses suboptimal:

  • Adversarial training: Directly incorporating sparse perturbations in adversarial training yields higher minδδ0s.t.f(x+δ)y  (untargeted),x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta) \neq y \; (\text{untargeted}), \quad x+\delta \in [0,1]^d9-robustness than standard minδδ0s.t.f(x+δ)=yadv,  δϵ,x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta)=y_{adv},\;\|\delta\|_\infty\leq\epsilon,\,x+\delta \in [0,1]^d0 or minδδ0s.t.f(x+δ)=yadv,  δϵ,x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta)=y_{adv},\;\|\delta\|_\infty\leq\epsilon,\,x+\delta \in [0,1]^d1 training (Croce et al., 2019, Croce et al., 2021).
  • Certified minδδ0s.t.f(x+δ)=yadv,  δϵ,x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta)=y_{adv},\;\|\delta\|_\infty\leq\epsilon,\,x+\delta \in [0,1]^d2 defenses: Randomized smoothing and statistical detection of extremely sparse anomalies are proposed as directions (Imtiaz et al., 2022, Vo et al., 2024).
  • Input/randomization defenses: Conventional approaches such as quantization, thresholding, or JPEG compression, only marginally reduce effectiveness unless impaired overall accuracy by minδδ0s.t.f(x+δ)=yadv,  δϵ,x+δ[0,1]d\min_{\delta} \|\delta\|_0 \quad \text{s.t.} \quad f(x+\delta)=y_{adv},\;\|\delta\|_\infty\leq\epsilon,\,x+\delta \in [0,1]^d3 (Krithivasan et al., 2020, Imtiaz et al., 2022).
  • Activation monitoring: Detection modules based on abrupt changes in internal activation sparsity or runtime energy profile offer exploratory protection against efficiency-focused sparsity attacks (Krithivasan et al., 2020).

Robust training for group- or structurally-sparse attacks and the integration of counterfactual explanations as part of a model’s introspection and defense suite represent ongoing research frontiers.


Sparse adversarial attacks thus embody a uniquely potent threat vector: by targeting a vanishing fraction of input or feature space, they consistently induce misclassification, undermine system safety, and challenge the interpretability and defendability of modern neural architectures. Their study connects optimization, geometric analysis, interpretability, and robust learning, and continues to motivate the design of physically realizable, explainable and application-specific adversarial evaluation protocols across domains (He et al., 2021, Kuvshinova et al., 2024, Imtiaz et al., 2022, Heshmati et al., 18 Oct 2025, Zhu et al., 2021, Hu et al., 2022, Vo et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Adversarial Attacks.