Agentic & Dynamic Red-Teaming Overview
- Agentic and dynamic red-teaming is an adaptive framework that uses automated, multi-stage adversarial evaluations to assess vulnerabilities in complex LLM systems.
- It systematically synthesizes and escalates adversarial attacks by exploiting persistent memory, multi-agent workflows, and tool integrations across dynamic execution graphs.
- Empirical benchmarks from frameworks like LAAF, DREAM, and AgenticRed highlight high success rates and underscore the need for lifecycle-aware defenses and continuous feedback integration.
Agentic and Dynamic Red-Teaming
Agentic and dynamic red-teaming denotes a class of automated, adaptive security evaluation methods targeting complex LLM systems, particularly those with persistent memory, multi-stage workflows, tool integrations, or agentic execution. These frameworks systematically synthesize, escalate, and refine adversarial attacks to expose vulnerabilities that static or single-turn red teaming fails to detect. They characterize and exploit system behaviors that emerge only through long-horizon interactions, cross-component orchestration, or environmental context, providing a rigorous methodology for uncovering advanced failure modes and informing robust defenses.
1. Foundations: Conceptual Scope and Distinction from Model-Level Red Teaming
The shift from static LLM red-teaming toward agentic, dynamic techniques is driven by the complexity of agentic LLM deployments. Traditional model-level red teaming treats an LLM as a function , seeking prompts that directly elicit unsafe responses in a single exchange. This approach is agnostic to context, tool calls, persistent memory, or multi-agent workflows, and therefore significantly underestimates the available adversarial surface in deployed systems.
Agentic red teaming, by contrast, examines an LLM as a component within a dynamic, interactive agent loop with a structured execution graph , where nodes correspond to tool calls, memory accesses, or inter-agent messages, and edges encode information/control flow. The adversary injects at one or more nodes to induce harmful global behavior by leveraging not only the core LLM but also its interactions with retrieval, code execution, tool outputs, and evolving context states (Wicaksono et al., 21 Sep 2025).
Dynamic red teaming further extends this by systematically generating and mutating attack strategies in multi-stage workflows, exploiting state persistence, iterative search, and automated feedback adaptation. This approach captures numerous vulnerabilities (e.g., logic-layer prompt injection, cross-session payload persistence, tool-chain attacks) that remain unobservable in isolated model-level tests (Atta et al., 18 Mar 2026, Lu et al., 22 Dec 2025).
2. Taxonomies and Attack Surfaces: Logic-Layer and Multi-Stage Vulnerability Models
A key element of agentic red teaming is the construction of comprehensive taxonomies and stateful attack models reflecting system lifecycle and context propagation. For instance, the Logic-layer Prompt Control Injection (LPCI) taxonomy in LAAF comprises 49 techniques across six categories: Encoding (11), Structural (8), Semantic (8), Layered (5), Trigger/Timing (12), and Exfiltration (5). Each technique is parameterized over five variants, 1,920 instruction contexts, and six system lifecycle stages, yielding a payload space exceeding 2.8 million unique attacks (Atta et al., 18 Mar 2026).
Lifecycle stages considered in LAAF and related frameworks are:
- S1 Reconnaissance
- S2 Logic-Layer Injection
- S3 Trigger Execution
- S4 Persistence/Reuse
- S5 Evasion/Obfuscation
- S6 Trace Tampering
Attackers may inject at any stage, and payloads are dynamically mutated and escalated between stages. This models realistic adversarial campaigns, where success at one stage seeds more sophisticated attacks in subsequent phases, exploiting evolving security posture, memory persistence, and context carryover.
Dynamic frameworks such as DREAM generalize this notion by representing attack-relevant facts, entities, and actions in a Cross-Environment Adversarial Knowledge Graph (CE-AKG): a stateful, environment-bridging graph informing multi-step, multi-environment attack chain construction via Contextualized Guided Policy Search (C-GPS) (Lu et al., 22 Dec 2025).
3. Adaptive Algorithms: Persistent Stage Breaker, Guided Policy Search, and Agentic Loops
Agentic and dynamic red-teaming methods are characterized by adaptive, feedback-driven search protocols, in contrast to static test suites. Key innovations include:
a) Persistent Stage Breaker (PSB) in LAAF
The PSB orchestrates stagewise attack escalation by:
- Searching for an EXEC-class (breakthrough) payload at each stage
- Mutating the winning payload (via seed variation, encoding mutation, or compound mutation) based on consecutive block count
- Seeding the next stage's search with a mutated batch, preserving memory persistence and adversarial context
- Interspersing random samples with probability to avoid local minima
PSB Algorithm Excerpt (cf. (Atta et al., 18 Mar 2026)):
1
b) Multi-Agent and Iterative Reasoning Loops
Frameworks such as Co-RedTeam decompose the attack process into multi-agent orchestrated stages: analysis, critique, planning, validation, execution, and evaluation, with each agent’s outputs feeding subsequent reasoning and action, grounded in continuous execution feedback and long-term memory (He et al., 2 Feb 2026). Co-RedTeam and similar pipelines explicitly model discovery→exploitation cycles, structured plan refinement, and adaptive memory-based retrieval.
c) Policy Search and Evolution
Dynamic attack chain construction in DREAM is realized as a PO-MDP over the CE-AKG, with policies that rank atomic actions by intrinsic risk, information exploitation, and strategic advancement, dynamically backtracking and branching as attacks unfold across environments (Lu et al., 22 Dec 2025). AgenticRed introduces full system-level evolutionary search over agentic red-teaming workflows themselves, iteratively composing, testing, and optimizing entire attack loops for maximal ASR (Yuan et al., 20 Jan 2026).
4. Empirical Benchmarks: Effectiveness and Systematic Evaluation
Agentic and dynamic frameworks deliver superior coverage and success rates compared to prior static or manual baselines, as quantified by explicit benchmarks:
| Framework | Target Model(s) | Attack Class (Summary) | Mean/Peak ASR | Notable Insights |
|---|---|---|---|---|
| LAAF | Gemini, Claude, ChatGPT, LLaMA3, Mixtral | LPCI, multi-category, 6-stage | 84% aggregate (up to 94% platform) | Layered/semantic attacks most effective; layered outperforms encoding (Atta et al., 18 Mar 2026) |
| DREAM | 12 SOTA LLMs | Multi-env, C-GPS, >70% agents broken | >70% per-step (24% best) | Cross-environment pivots sharply amplify risk (Lu et al., 22 Dec 2025) |
| PSB, Co-RedTeam | Gemini-3, open-source models | Multi-agent, iterative, code execution | Up to 65% (exploit tasks), 10–20% gain over SOTA | Feedback and structured interaction drives success (He et al., 2 Feb 2026) |
| AJAR | LLMs + tool use | Petri-net audit loop, stateful backtracking | 82% (text), 68% (tool), shifted by agentic gap | Code injection emerges with tool use, persona attack resistance rises (Dou et al., 16 Jan 2026) |
| AgenticRed | Llama-2/3, GPT-3.5/4o-mini, Claude 3.5 | System-level workflow evolution | 96–100% open, 60% closed | Automated architectural search outperforms human-designed (Yuan et al., 20 Jan 2026) |
These results underscore that dynamic, stage-aware, and agentic approaches not only discover more vulnerabilities but also reveal new classes of agentic-only or cross-environmental exploits that static red teaming will systematically miss.
5. State-of-the-Art Gaps, Attack Categories, and Agentic-Only Vulnerabilities
Empirical studies directly comparing model-level and agentic-level red teaming (e.g., AgentSeer) establish a clear “agentic gap”:
- Agentic-only vulnerabilities: Certain attack goals (e.g., objective 0) are unattainable via isolated model-level iterative attacks but succeed in agentic runs, especially those involving tool-calling subgraphs or multi-agent handoffs (e.g., up to 67% ASR in agentic-only cases for specific objectives) (Wicaksono et al., 21 Sep 2025).
- Vulnerability transfer failure: Prompts effective at model-level may fail at agentic-level due to agent wrappers, API normalization, or tool sanitization.
- Dynamic strategy emergence: Multi-stage search and iterative refinement uncovers nontrivial strategies, e.g., timing-based triggers, layered obfuscation, adversarial context pivots, that defeat static or prompt-injection-focused defenses (Atta et al., 18 Mar 2026, Lu et al., 22 Dec 2025).
These observations generalize to domains beyond text (e.g., code security, economic interactions, policy adherence) and underpin recommendations for defense-in-depth validation and runtime logic monitoring as opposed to static prompt blocking.
6. Implications for Defense: Multi-Stage, Contextual, and Policy-Aware Hardening
Dynamic and agentic red-teaming uncovers limitations in static, prompt-engineered, or single-phase defensive measures:
- Static prompt filters are inadequate to defend against evolving, context-carrying, or layered attacks; semantic reframing and obfuscation drastically decrease filtering effectiveness (Atta et al., 18 Mar 2026, Lu et al., 22 Dec 2025).
- Lifecycle-aware defense: Effective mitigation must instrument each lifecycle phase (recon, injection, trigger, persistence, etc.), apply cross-session/rehydration detection, and leverage runtime consistency checks (e.g., metadata, access provenance, audit logs), not just static output filtering.
- Policy-aware checking: Policy-adherent agent red teaming (CRAFT) shows that multi-turn, deception-aware, and avoidance-aware adversaries circumvent hierarchical or fragment reminder defenses; stateful cross-turn policy enforcement is imperative (Nakash et al., 11 Jun 2025).
- Continuous feedback integration: Lifelong attack integration and feedback-guided exploration, as in AutoRedTeamer and AgenticRed, are critical for keeping defensive coverage current with the growing and evolving attack landscape (Zhou et al., 20 Mar 2025, Yuan et al., 20 Jan 2026).
7. Limitations, Open Problems, and Future Directions
Despite major advances, agentic and dynamic red-teaming frameworks manifest practical challenges:
- Compute cost and query efficiency: Automated system evolution (e.g., AgenticRed) entails large numbers of intermediate model and judge queries per successful attack; future work proposes multi-objective optimization incorporating testing budget as a constraint (Yuan et al., 20 Jan 2026).
- Mode collapse and coverage: Strong selection or feedback strategies may bias toward a few successful attack archetypes; both novelty bonuses and diversity-aware scoring have been suggested to promote broader exploration (Zhou et al., 20 Mar 2025, Yuan et al., 20 Jan 2026).
- Cross-domain and cross-framework generality: Most frameworks (e.g., LAAF, Co-RedTeam) are currently evaluated in a finite set of applied domains or orchestration stacks; broader generalization, multi-agent adversary simulation, and cross-platform observability are recognized needs (Atta et al., 18 Mar 2026, Wicaksono et al., 21 Sep 2025, Lu et al., 22 Dec 2025).
- Formal guarantees: Few agentic approaches currently provide formal regret bounds or statistical confidence intervals on discovered vulnerabilities or coverage; this remains an open question for theoretical foundations and certification (Chen et al., 2 Apr 2025).
- Integration into deployment pipelines: Robust CI/CD integration and live monitoring of agentic vulnerabilities (e.g., via LAAF, DREAM) are in early stages; operationalizing continuous, automated posture evaluation is an explicit future direction (Atta et al., 18 Mar 2026, Lu et al., 22 Dec 2025).
In summary, agentic and dynamic red-teaming systematically operationalizes scalable, lifecycle-aware, and adaptive vulnerability discovery workflows for complex LLM systems, exposing entire attack surfaces that static methods overlook, and imposing new requirements for defense that match the evolving sophistication of agentic adversaries. Foundational results and frameworks define the state of the art for both research and practical deployment (Atta et al., 18 Mar 2026, Wicaksono et al., 21 Sep 2025, Lu et al., 22 Dec 2025, He et al., 2 Feb 2026, Dou et al., 16 Jan 2026, Yuan et al., 20 Jan 2026).