Adversarial ML Taxonomy

Updated 17 December 2025

Adversarial Machine Learning Taxonomy is a structured framework that categorizes attack strategies and defense methods across the ML lifecycle by formalizing threat models, data tampering, and robustness metrics.
It outlines diverse attack paradigms such as evasion, poisoning, model extraction, and connects them with defenses including adversarial training, randomized smoothing, and certified verification.
The taxonomy emphasizes the dynamic arms race between adaptive attacks and evolving defenses, spotlighting open challenges like scalability, verification, and real-world application under nonstationary environments.

Adversarial Machine Learning (AML) investigates vulnerabilities in machine learning systems to input or training data manipulation, formalizing attacks and defenses in a unified, hierarchical framework. AML encompasses settings where the predictions or learned models can be intentionally perturbed by adversaries to achieve malicious objectives, under various knowledge, capability, and domain constraints. A rigorous taxonomy identifies the compositional structure of attacks—classified by stages of the ML life cycle, threat models, perturbation bounds, and attacker goals—and connects these to a spectrum of defensive methodologies, spanning empirical tactics to provable robustness and formal verification. The field is characterized by a persistent arms race between increasingly adaptive attacks and more sophisticated defenses, with key open challenges centered on certified robustness, scalability, and adapting to nonstationary real-world environments (Jha, 8 Feb 2025, Wu et al., 2023).

1. Formal Threat Models and Unifying Notation

At the core of AML taxonomy lies the formal definition of threat models delineated by adversary knowledge $K$ , capabilities $C$ , and goals $G$ :

Knowledge $K$ :
- White-box: adversary has complete access to model parameters $\theta$ , architecture, gradients $\nabla_x \mathcal{L}(f_\theta(x), y)$ , and training data.
- Black-box: adversary may only query $f_\theta(x)$ or obtain loss outputs, lacking direct access to model internals (Jha, 8 Feb 2025, Li et al., 2018, Wu et al., 2023).
Capabilities $C$ :
- Evasion: perturbation of inputs at inference time such that $\| \delta \|_p \le \epsilon$ .
- Poisoning: injection or modification of up to $N_\mathrm{poison}$ training samples, possibly under $\ell_p$ constraints (feature or problem space).
- Model Extraction/Inference: query-based reconstruction or privacy inference (membership, data extraction) (Jha, 8 Feb 2025, Ibitoye et al., 2019, Rosenberg et al., 2020).
Goals $G$ :
- Untargeted: induce any misclassification $f_\theta(x') \neq y$ .
- Targeted: force classification to a specific class $t$ .
- Backdoor/Trigger: produce targeted misclassification only when an embedded trigger is present.
- Confidentiality/Privacy: extract internal details or verify membership of data (Jha, 8 Feb 2025, Wu et al., 2023, Habler et al., 2022).

The attack space is further stratified by labeling attacks as feature-space (direct manipulation of input vectors) or problem-space (domain-constrained modifications to raw data objects) (Ibitoye et al., 2019).

2. Taxonomy of AML Attacks Across the ML Lifecycle

A comprehensive AML taxonomy emerges from analyzing attacks by their stage of occurrence and mechanism (Jha, 8 Feb 2025, Wu et al., 2023, Li et al., 2018):

ML Lifecycle Stage	Attack Paradigm	Mechanism/Goal
Pre-training	Data-poisoning backdoors	Stateless poisoning with additive/static triggers; e.g. BadNets, blended triggers. Target: backdoor mapping trigger $\rightarrow$ $y_e$ .
In-training	Training-controllable backdoors	Dynamic optimization of jointly learned triggers and weights; input-aware, semantic, or federated triggers.
Post-training	Weight/Parameter attacks	White/gray-box optimization or physical bit-flip to induce malicious behavior in deployed models.
Deployment	Device-level parameter tampering	Rowhammer, laser fault; bit-flip attack on stored weights.
Inference	Evasion (adversarial examples)	White-box: direct gradient methods; black-box: transfer, finite difference, decision/score-based (Jha, 8 Feb 2025, Wu et al., 2023). Also: backdoor or weight-attack activation, model extraction, membership inference.

Prominent attack methods include:

FGSM: $\delta = \epsilon \cdot \operatorname{sign}(\nabla_x \mathcal{L}(f_\theta(x), y))$ (white-box, $\ell_\infty$ ).
PGD: $x_{k+1} = \mathrm{Proj}_{\| \cdot \|_p \le \epsilon}(x_k + \alpha \operatorname{sign}(\nabla_x \mathcal{L}(f_\theta(x_k), y)))$ (multi-step, white-box).
Carlini–Wagner: minimize $\| \eta \|_2^2 + c \cdot \phi(x+\eta)$ .
Finite-difference/Black-box: $\hat{g}_i = [\mathcal{L}(f_\theta(x+\delta e_i)) - \mathcal{L}(f_\theta(x))]/\delta$ .

Evasion, poisoning, transfer (surrogate) attacks, and privacy/model extraction strategies all map to distinct positions in the hierarchical taxonomy.

3. Defense Methodologies: Empirical to Certified Robustness

Defense taxonomy is structured by the stage of intervention and level of robustness guarantee (Jha, 8 Feb 2025, Silva et al., 2020, Wu et al., 2023).

Input Preprocessing & Gradient Masking:
- Defensive Quantization: quantize gradients to $b$ bits, $\nabla_x^\mathrm{quant} = \operatorname{round}(\nabla_x 2^{b-1}) / 2^{b-1}$ , aiming to obfuscate gradients.
- Randomized Smoothing: addition of Gaussian noise to the input, forming a smoothed classifier $\tilde{f}$ with certified $\ell_2$ -robustness radius $r = (\sigma/2)(\Phi^{-1}(p_A) - \Phi^{-1}(p_B))$ (Jha, 8 Feb 2025, Silva et al., 2020).
Adversarial Training:
- Empirical minimization of worst-case loss within the perturbation budget:
$\theta^* = \arg\min_\theta \mathbb{E}_{(x, y) \sim \mathcal{D}} \left[ \max_{\| \delta \|_p \le \epsilon} \mathcal{L}(f_\theta(x+\delta), y) \right]$ - Variations include min–max PGD training, ensemble adversarial training, logit pairing (Jha, 8 Feb 2025, Silva et al., 2020).
Certified Robustness Techniques:
- Interval Bound Propagation (IBP): layer-wise propagation of activation bounds, guaranteeing no decision change within $\epsilon$ .
- Randomized Smoothing: as above, with provable certificates in the $\ell_2$ -norm.
- Exact/LP/SDP relaxations: Reluplex (SMT), convex polytope theory, abstract interpretation for smaller models or relaxed bounds (Jha, 8 Feb 2025, Silva et al., 2020).
Other Tactics:
- Data Sanitization: detection/removal of anomalous samples via influence functions or stat tests.
- Differential Privacy: restriction of single-sample influence on model outputs, mitigating poisoning.
- Detection/Reactive Measures: feature squeezing, anomaly detectors, statistical discrepancy tests (Jha, 8 Feb 2025, Li et al., 2018).

4. Evaluation Metrics and Robustness Criteria

Robustness in AML is quantified via metrics that capture classifier accuracy under threat constraints:

Robust Accuracy:

$\mathrm{RobustAccuracy} = \frac{1}{N} \sum_{i=1}^N \mathbf{1}\left( f(x_i + \delta_i^*) = y_i \right)$

with $\delta_i^* = \arg\max_{\| \delta \|_p \le \epsilon} \mathcal{L}(f_\theta(x_i+\delta), y_i)$ (Li et al., 2018).

Certified Radius: Size of the neighborhood (as in IBP, randomized smoothing) guaranteeing unchanged prediction.
Risk Assessment (domain-specific): $\mathrm{Risk} \approx \mathrm{Effectiveness} \times \mathrm{Impact} \times \mathrm{Likelihood}$ , combining attack effectiveness and threat actor attributes (Habler et al., 2022).

Best practices include clearly specified threat models (knowledge, capability, goal, perturbation norm), evaluation against adaptive adversaries, provable guarantees when possible, and reproducibility.

5. Hierarchical and Multi-Domain Taxonomy Structures

AML taxonomy is inherently hierarchical, accommodating multiple dimensions (Jha, 8 Feb 2025, Wu et al., 2023, Thummala et al., 14 May 2024):

Attack taxonomy:

Threat model (knowledge, capability, goal)
Evasion attacks (untargeted, targeted, optimization, transfer)
Poisoning attacks (untargeted, targeted/backdoor)
Model extraction and privacy inference

Defense taxonomy:

Input preprocessing and gradient masking
Adversarial training, ensemble methods
Certified defenses (IBP, randomized smoothing, LP relaxations)

Domain-specific extensions:
- Network security: risk grid based on discriminative and directive autonomy (Ibitoye et al., 2019).
- Cyber security: seven-axis taxonomy including attacker knowledge, training-set access, targeting, and output format (Rosenberg et al., 2020).
- Spacecraft AML: axes on mission objectives, resource constraints, learning stage, storage architecture, C&DH access, model exposure (Thummala et al., 14 May 2024).
- Wireless communications: axes of influence, phase, knowledge, attack goals; mirrored in layered defense taxonomy (Adesina et al., 2020).

6. Open Challenges and Research Directions

Key open problems in AML reflect both foundational and engineering barriers (Jha, 8 Feb 2025, Silva et al., 2020):

Adaptive Attacks vs. Gradient Obfuscation: Defenses based on gradient masking (e.g., defensive quantization, non-differentiable layers) are commonly bypassed using BPDA or transfer-based methods. There is an unresolved tension between masking for robustness and the need for clean gradients during training and verification.
Verification Scalability: The complexity of exact formal verification ( $O((2n)^L)$ for $L$ -layer, $n$ -neuron networks) precludes its application to large real-world models, forcing a trade-off to approximation or probabilistic bounds.
Certified Robustness vs. Model Capacity: Tight bounds on robustness often require simplifying model architectures, incurring a trade-off between robustness radius, expressivity, accuracy, and computational load.
Distribution Shift and Real-World Threats: Existing evaluations are primarily on clean digital benchmarks with $\ell_p$ -bounded adversarial examples; real-world adversaries exploit physical attacks, sensor noise, and distributional shifts, challenging simplistic threat models.
Unified Theories and Benchmarks: Lack of a unified theory of adversarial transferability or standardized, diverse benchmarks impedes reproducibility and principled progress (Wu et al., 2023, Adesina et al., 2020).

7. Synthesis and Outlook

AML taxonomy provides a rigorous scaffold for understanding and combating the evolving spectrum of adversarial threats. Hierarchical, multi-dimensional frameworks—rooted in formal threat models, life-cycle analysis, and robust optimization—distinguish attack and defense paradigms, clarify trade-offs, and expose gaps. The interplay between attack sophistication (e.g., adaptive, physical, or domain-constrained perturbations) and defensive innovation (empirical, certified, combined) ensures the AML landscape remains dynamic. Scalability, real-world deployment, and standardized, attack-agnostic robustness guarantees remain central targets for ongoing and future research (Jha, 8 Feb 2025, Wu et al., 2023, Thummala et al., 14 May 2024).