ARMOR: Agentic Reasoning for Adversarial Attacks
- The paper introduces ARMOR, which dynamically orchestrates multiple attack primitives via AI agents to overcome static ensemble limitations.
- It leverages Vision–Language and Large Language Models for real-time semantic reasoning and adaptive hyperparameter reparameterization.
- Empirical results show ARMOR's superior attack success rates, outperforming conventional methods in transfer-based black-box scenarios.
Agentic Reasoning for Methods Orchestration and Reparameterization (ARMOR) is a closed-loop adversarial attack framework that leverages agentic reasoning to dynamically orchestrate and reparameterize multiple attack primitives via a society of AI agents. ARMOR integrates Vision–LLMs (VLMs) and LLMs to achieve robust, transferable, and adaptive adversarial attacks, primarily in a transfer-based black-box scenario. By synthesizing dense, sparse, and geometric perturbations adaptively through a shared "Mixing Desk," ARMOR addresses critical limitations of static ensemble attack suites, namely their lack of semantic awareness and inability to adapt to new models or exploit image-specific vulnerabilities (Rong et al., 26 Jan 2026).
1. Motivation and Problem Setting
ARMOR is motivated by the shortcomings of traditional automated attack ensembles, which operate as static sequences or combinations of fixed hyperparameter attacks (e.g., AutoAttack). These suites lack semantic reasoning capabilities, suffer from stagnation under constrained budgets, and demonstrate limited cross-architecture transferability. ARMOR specifically addresses challenges in the transfer-based black-box setting, wherein perturbations are crafted on an ensemble of surrogate models (e.g., ResNet-50, DenseNet-121) and then applied to a blind target architecture (e.g., ViT-B/16). The core needs addressed by ARMOR include dynamic allocation of attack budgets across complementary geometries (dense, sparse, geometric), semantic targeting of image regions, and real-time hyperparameter reparameterization guided by high-level vision–language reasoning (Rong et al., 26 Jan 2026).
2. Core Architecture and Components
ARMOR orchestrates three canonical adversarial attack primitives: Carlini–Wagner (CW), Jacobian-based Saliency Map Attack (JSMA), and Spatially Transformed Attack (STA). The system is composed of specialized agents, each contributing to distinct stages of the attack orchestration:
- InfoAgent (VLM): Extracts semantically salient cues from the input image, such as region annotations and texture features, using models like Qwen2.5-VL.
- ConductorAgent (LLM): Aggregates semantic reports and baseline confidences to set global constraints, notably the budget and minimum SSIM threshold .
- AdvisorAgents (LLM): Propose hyperparameters for each attack method based on the run history .
- MethodAgents: Execute the CW, JSMA, and STA algorithms with the supplied hyperparameters.
- MixerAgent: Implements the Mixing Desk, optimizing a convex weight vector to synthesize the output perturbation that maximizes a custom score .
- CritiqueAgents: Assess attack outcomes through vectors comprising black-box and surrogate confidences along with SSIM.
- StrategistAgent: Detects stagnation and adaptively relaxes global constraints to facilitate escape from local optima.
The interaction among these agents is iterative and closed-loop, enabling the system to learn from intermediate results and to reparameterize attacks in real time (Rong et al., 26 Jan 2026).
3. Formalization of Attack Primitives
3.1 Carlini–Wagner (CW) Attack
CW adversarial examples are generated by solving: subject to and , with projection performed at each step:
3.2 Jacobian-based Saliency Map Attack (JSMA)
At every iteration, JSMA selects a pixel pair that maximizes adversarial saliency: where and , subject to $\alpha_{pq]>0$, . Pixels are perturbed and projected to satisfy and .
3.3 Spatially Transformed Attack (STA)
STA parameterizes a flow field and applies a differentiable warping: Optimizing: where encourages smoothness and is a distortion trade-off.
4. Agentic Closed-Loop Orchestration
ARMOR proceeds in discrete iterations , with the following stages:
- Reconnaissance: InfoAgent analyzes the input and semantic cues. Baseline confidences (for surrogate and black-box) are evaluated.
- Objective Formulation: ConductorAgent sets according to semantic features and .
- Parallel Perturbation Generation: For each method , AdvisorAgents propose method-specific hyperparameters , and MethodAgents generate .
- Adaptive Perturbation Ensemble: MixerAgent performs randomized hill-climbing over simplex to maximize
Composite perturbation: with .
- Critique and Strategic Adaptation: CritiqueAgents append performance vectors to . StrategistAgent detects stagnation using sliding window statistics. If necessary, constraints are relaxed: , .
The process repeats until the attack succeeds or resource constraints are met.
5. Reparameterization via Advisor Agents
AdvisorAgents implement a gradient-free search over method- and globally-specific hyperparameters:
- CW:
- JSMA:
- STA:
- Global:
Given the historical success/failure vectors, AdvisorAgents sample local modifications, predict impact using learned/heuristic rules, and select those improving the expected . The critique vector supplies the reinforcement signal, incentivizing increments in black-box confidence and SSIM. This mechanism is analogous to hill-climbing or evolutionary strategies, albeit directed by both textual and numeric signals (Rong et al., 26 Jan 2026).
6. Empirical Evaluation
ARMOR was evaluated on the AADD-LQ dataset (710 low-quality fake images at resolution), attacking a blind ViT-B/16 with surrogates (ResNet-50, DenseNet-121), and constrained to , max 10,000 queries per image. Key metrics include Attack Success Rate (ASR), SSIM-weighted ASR (wASR), and transfer indicators , , , with ASR.
Results:
- On surrogates, ARMOR achieved perfect ASR=$1.000$, wASR.
- On blind ViT-B/16, ARMOR achieved ASR=$0.396$ (wASR=$0.280$, SSIM=$0.701$), over double the next-best ensemble (AutoAttack-PGD, ASR). Transfer-based attacks such as MI-FGSM and DI-FGSM dropped to ASR.
- Transferability: ARMOR attained , ASR, outperforming TI-FGSM () and AutoAttack-PGD ().
Ablations showed that removing the InfoAgent, or agentic orchestration entirely, reduced transfer ASR to nearly zero (ASR), establishing the necessity of agentic multi-agent reasoning and hyperparameter adaptation (Rong et al., 26 Jan 2026).
7. Limitations and Future Directions
ARMOR incurs significant computational overhead due to the parallel execution of three gradient-based attacks, real-time LLM/VLM queries, and ensemble hill-climbing, resulting in high GPU and multi-node costs. Generalization to domains beyond low-quality fake images (e.g., natural images or video) has yet to be established. Several future research directions are noted:
- Incorporation of perceptual similarity metrics beyond SSIM, such as LPIPS, to further enhance fidelity-performance trade-offs.
- Extension to additional attack modalities, including generative-model-based and patch attacks.
- Exploration of defenses purpose-built to detect the coordination signals emergent from multi-agent attack orchestration (Rong et al., 26 Jan 2026).
ARMOR reconceptualizes adversarial attack generation as a collaborative, multi-agent reasoning process, yielding enhanced transferability and robustness relative to static attack ensembles.