Papers
Topics
Authors
Recent
Search
2000 character limit reached

ARMOR: Agentic Reasoning for Adversarial Attacks

Updated 2 February 2026
  • The paper introduces ARMOR, which dynamically orchestrates multiple attack primitives via AI agents to overcome static ensemble limitations.
  • It leverages Vision–Language and Large Language Models for real-time semantic reasoning and adaptive hyperparameter reparameterization.
  • Empirical results show ARMOR's superior attack success rates, outperforming conventional methods in transfer-based black-box scenarios.

Agentic Reasoning for Methods Orchestration and Reparameterization (ARMOR) is a closed-loop adversarial attack framework that leverages agentic reasoning to dynamically orchestrate and reparameterize multiple attack primitives via a society of AI agents. ARMOR integrates Vision–LLMs (VLMs) and LLMs to achieve robust, transferable, and adaptive adversarial attacks, primarily in a transfer-based black-box scenario. By synthesizing dense, sparse, and geometric perturbations adaptively through a shared "Mixing Desk," ARMOR addresses critical limitations of static ensemble attack suites, namely their lack of semantic awareness and inability to adapt to new models or exploit image-specific vulnerabilities (Rong et al., 26 Jan 2026).

1. Motivation and Problem Setting

ARMOR is motivated by the shortcomings of traditional automated attack ensembles, which operate as static sequences or combinations of fixed hyperparameter attacks (e.g., AutoAttack). These suites lack semantic reasoning capabilities, suffer from stagnation under constrained budgets, and demonstrate limited cross-architecture transferability. ARMOR specifically addresses challenges in the transfer-based black-box setting, wherein perturbations are crafted on an ensemble of surrogate models (e.g., ResNet-50, DenseNet-121) and then applied to a blind target architecture (e.g., ViT-B/16). The core needs addressed by ARMOR include dynamic allocation of attack budgets across complementary geometries (dense, sparse, geometric), semantic targeting of image regions, and real-time hyperparameter reparameterization guided by high-level vision–language reasoning (Rong et al., 26 Jan 2026).

2. Core Architecture and Components

ARMOR orchestrates three canonical adversarial attack primitives: Carlini–Wagner (CW), Jacobian-based Saliency Map Attack (JSMA), and Spatially Transformed Attack (STA). The system is composed of specialized agents, each contributing to distinct stages of the attack orchestration:

  • InfoAgent (VLM): Extracts semantically salient cues from the input image, such as region annotations and texture features, using models like Qwen2.5-VL.
  • ConductorAgent (LLM): Aggregates semantic reports and baseline confidences to set global constraints, notably the \ell_\infty budget ϵ\epsilon and minimum SSIM threshold τ\tau.
  • AdvisorAgents (LLM): Propose hyperparameters for each attack method based on the run history H\mathcal{H}.
  • MethodAgents: Execute the CW, JSMA, and STA algorithms with the supplied hyperparameters.
  • MixerAgent: Implements the Mixing Desk, optimizing a convex weight vector wΔ3w\in\Delta^3 to synthesize the output perturbation that maximizes a custom score S(w)S(w).
  • CritiqueAgents: Assess attack outcomes through vectors comprising black-box and surrogate confidences along with SSIM.
  • StrategistAgent: Detects stagnation and adaptively relaxes global constraints (ϵ,τ)(\epsilon, \tau) to facilitate escape from local optima.

The interaction among these agents is iterative and closed-loop, enabling the system to learn from intermediate results and to reparameterize attacks in real time (Rong et al., 26 Jan 2026).

3. Formalization of Attack Primitives

3.1 Carlini–Wagner (CW) Attack

CW adversarial examples are generated by solving: minδ  J(δ)=δ22+cLadv(Z(x+δ),t)\underset{\delta}{\min}\; J(\delta) = \|\delta\|_2^2 + c\cdot\mathcal{L}_{\mathrm{adv}}(Z(x+\delta), t) subject to x+δ[0,1]dx+\delta\in[0,1]^d and δϵ\|\delta\|_\infty\leq\epsilon, with projection performed at each step: δΠ,ϵ(δηδJ(δ))\delta\leftarrow\Pi_{\infty,\epsilon}(\delta - \eta\nabla_\delta J(\delta))

3.2 Jacobian-based Saliency Map Attack (JSMA)

At every iteration, JSMA selects a pixel pair (p,q)(p^*,q^*) that maximizes adversarial saliency: (p,q)=argmax(p,q)[αpqβpq](p^*,q^*) = \arg\max_{(p,q)} [-\alpha_{pq}\cdot\beta_{pq}] where αpq=i{p,q}Zt(x)/xi\alpha_{pq} = \sum_{i\in\{p,q\}} \partial Z_t(x)/\partial x_i and βpq=i{p,q}jtZj(x)/xi\beta_{pq} = \sum_{i\in\{p,q\}} \sum_{j\ne t} \partial Z_j(x)/\partial x_i, subject to $\alpha_{pq]>0$, βpq<0\beta_{pq}<0. Pixels are perturbed and projected to satisfy δϵ\|\delta\|_\infty\leq\epsilon and x+δ[0,1]x+\delta\in[0,1].

3.3 Spatially Transformed Attack (STA)

STA parameterizes a flow field ff and applies a differentiable warping: xadv(f)=W(x;f)x_{\mathrm{adv}}(f) = \mathcal{W}(x; f) Optimizing: minf  Ladv(xadv(f),t)+θLflow(f)\min_f\; \mathcal{L}_{\mathrm{adv}}(x_{\mathrm{adv}}(f), t) + \theta\cdot\mathcal{L}_{\mathrm{flow}}(f) where Lflow(f)\mathcal{L}_{\mathrm{flow}}(f) encourages smoothness and θ\theta is a distortion trade-off.

4. Agentic Closed-Loop Orchestration

ARMOR proceeds in discrete iterations k=1,2,k=1,2,\ldots, with the following stages:

  1. Reconnaissance: InfoAgent analyzes the input and semantic cues. Baseline confidences pbasep_{\text{base}} (for surrogate and black-box) are evaluated.
  2. Objective Formulation: ConductorAgent sets (ϵ,τ)(\epsilon,\tau) according to semantic features and pbasep_{\text{base}}.
  3. Parallel Perturbation Generation: For each method m{CW,JSMA,STA}m\in\{\mathrm{CW, JSMA, STA}\}, AdvisorAgents propose method-specific hyperparameters φm\varphi_m, and MethodAgents generate δm\delta_m.
  4. Adaptive Perturbation Ensemble: MixerAgent performs randomized hill-climbing over simplex Δ3\Delta^3 to maximize

S(w)=λpbb(txmaster(w))+(1λ)pˉsurr(txmaster(w))μmax(0,τSSIM(x,xmaster(w)))S(w) = \lambda p_{bb}(t|x_{\mathrm{master}}(w)) + (1-\lambda)\bar{p}_{\mathrm{surr}}(t|x_{\mathrm{master}}(w)) - \mu\cdot\max(0,\tau-\mathrm{SSIM}(x,x_{\mathrm{master}}(w)))

Composite perturbation: δmix(w)=w1δCW+w2δJSMA+w3δSTA\delta_{\mathrm{mix}}(w) = w_1\delta_{\mathrm{CW}} + w_2\delta_{\mathrm{JSMA}} + w_3\delta_{\mathrm{STA}} with wi=1\sum w_i=1.

  1. Critique and Strategic Adaptation: CritiqueAgents append performance vectors to H\mathcal{H}. StrategistAgent detects stagnation using sliding window statistics. If necessary, constraints are relaxed: τmax(τmin,τΔτ)\tau\gets \max(\tau_{\min}, \tau-\Delta_\tau), ϵmin(ϵmax,ϵ+Δϵ)\epsilon\gets \min(\epsilon_{\max}, \epsilon + \Delta_\epsilon).

The process repeats until the attack succeeds or resource constraints are met.

5. Reparameterization via Advisor Agents

AdvisorAgents implement a gradient-free search over method- and globally-specific hyperparameters:

  • CW: φCW=(c,η)\varphi_{\mathrm{CW}}=(c, \eta)
  • JSMA: φJSMA=(α,k)\varphi_{\mathrm{JSMA}}=(\alpha, k)
  • STA: φSTA=(γ,Tsta,Smin,Smax,θ)\varphi_{\mathrm{STA}}=(\gamma, T_{sta}, S_{min}, S_{max}, \theta)
  • Global: (ϵ,τ)(\epsilon, \tau)

Given the historical success/failure vectors, AdvisorAgents sample local modifications, predict impact using learned/heuristic rules, and select those improving the expected S(w)S(w^*). The critique vector c(k)c^{(k)} supplies the reinforcement signal, incentivizing increments in black-box confidence pbbp_{bb} and SSIM. This mechanism is analogous to hill-climbing or evolutionary strategies, albeit directed by both textual and numeric signals (Rong et al., 26 Jan 2026).

6. Empirical Evaluation

ARMOR was evaluated on the AADD-LQ dataset (710 low-quality fake images at 224×224224\times224 resolution), attacking a blind ViT-B/16 with surrogates (ResNet-50, DenseNet-121), and constrained to 8/255\ell_\infty\leq 8/255, max 10,000 queries per image. Key metrics include Attack Success Rate (ASR), SSIM-weighted ASR (wASR), and transfer indicators psurrp_{\mathrm{surr}}, ptgtp_{\mathrm{tgt}}, pcondp_{\mathrm{cond}}, with Δ\DeltaASR=psurrptgt=p_{\mathrm{surr}}-p_{\mathrm{tgt}}.

Results:

  • On surrogates, ARMOR achieved perfect ASR=$1.000$, wASR0.98\approx0.98.
  • On blind ViT-B/16, ARMOR achieved ASR=$0.396$ (wASR=$0.280$, SSIM=$0.701$), over double the next-best ensemble (AutoAttack-PGD, ASR0.196\approx0.196). Transfer-based attacks such as MI-FGSM and DI-FGSM dropped to ASR<0.07<0.07.
  • Transferability: ARMOR attained psurr=1.000,ptgt=0.396,pcond=0.396p_{\mathrm{surr}}=1.000,\,p_{\mathrm{tgt}}=0.396,\,p_{\mathrm{cond}}=0.396, Δ\DeltaASR=0.604=0.604, outperforming TI-FGSM (pcond=0.165p_{\mathrm{cond}}=0.165) and AutoAttack-PGD (pcond=0.196p_{\mathrm{cond}}=0.196).

Ablations showed that removing the InfoAgent, or agentic orchestration entirely, reduced transfer ASR to nearly zero (ASR0.0060.01\approx0.006{-}0.01), establishing the necessity of agentic multi-agent reasoning and hyperparameter adaptation (Rong et al., 26 Jan 2026).

7. Limitations and Future Directions

ARMOR incurs significant computational overhead due to the parallel execution of three gradient-based attacks, real-time LLM/VLM queries, and ensemble hill-climbing, resulting in high GPU and multi-node costs. Generalization to domains beyond low-quality fake images (e.g., natural images or video) has yet to be established. Several future research directions are noted:

  • Incorporation of perceptual similarity metrics beyond SSIM, such as LPIPS, to further enhance fidelity-performance trade-offs.
  • Extension to additional attack modalities, including generative-model-based and patch attacks.
  • Exploration of defenses purpose-built to detect the coordination signals emergent from multi-agent attack orchestration (Rong et al., 26 Jan 2026).

ARMOR reconceptualizes adversarial attack generation as a collaborative, multi-agent reasoning process, yielding enhanced transferability and robustness relative to static attack ensembles.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Agentic Reasoning for Methods Orchestration and Reparameterization (ARMOR).