Agentic-Only Vulnerabilities in Autonomous AI

Updated 28 September 2025

Agentic-only vulnerabilities are defined by emergent flaws in autonomous AI systems that enable exploits not present in traditional, stateless models.
They encompass risks such as memory poisoning, Logic-layer Prompt Control Injection, and TOCTOU exploits, revealing method-specific attack pathways.
Mitigation requires layered, context-aware controls including advanced input/output validation, persistent memory integrity, and cryptographic safeguards.

Agentic-only vulnerabilities are security and privacy risks that arise uniquely when LLMs or similar AI capabilities are embedded as autonomous agents with the ability to reason, remember, interact with tools, and execute actions, as opposed to traditional, stateless, single-turn systems. These vulnerabilities are linked to the architectural, operational, and contextual characteristics that emerge only in agentic deployments—such as persistent memory, dynamic tool calling, planning, feedback loops, and adaptive goal-seeking. Unlike conventional model-level weaknesses, agentic-only vulnerabilities exclusively manifest due to the complexity and autonomy afforded to agentic AI, resulting in new classes of exploitable attack surfaces, privilege escalation paths, and failure modes that necessitate specialized threat models and mitigation strategies.

1. Defining Agentic-Only Vulnerabilities

Agentic-only vulnerabilities are security flaws or exploitable behaviors that are present exclusively in agentic AI systems—those which operate autonomously across multi-step processes, interact with external data sources or tools, persist long-term memory, and adapt behaviorally based on dynamic context. These vulnerabilities are not apparent or exploitable in the underlying LLM or ML model when evaluated in isolation. Their defining characteristics include:

Emergent from System Context: They appear only due to the orchestration of components such as tool interfaces, persistent memory handlers, workflow managers, and inter-agent communication protocols (Wicaksono et al., 5 Sep 2025, Wicaksono et al., 21 Sep 2025).
Absent in Standalone Models: Attacks that fail against standalone LLMs may succeed when the same model is integrated within an agentic workflow, especially where tool-calling, memory, or feedback mechanisms are involved.
Context- and Sequence-Dependent: Vulnerabilities may depend on the order and temporal separation of checks and actions (e.g., TOCTOU patterns (Lilienthal et al., 23 Aug 2025)) or arise only during interaction with dynamic environments, such as memory stores or role-shifting in multi-agent systems (Atta et al., 14 Jul 2025).
Exploitable by Adaptive and Delayed Attacks: The complex, often non-deterministic behaviors of agentic systems can be manipulated via sophisticated, nested, or delayed adversarial strategies—such as Logic-layer Prompt Control Injection (LPCI) and feedback-based manipulations (Atta et al., 14 Jul 2025, Ming et al., 3 Jun 2025).

The conceptual distinction is critical: agentic-only vulnerabilities are an intrinsic product of agency—dynamic goal pursuit, self-directed tool use, persistent memory, or multi-agent coordination—rather than pure LLM architecture flaws or dataset issues.

2. Categories and Representative Examples

Agentic-only vulnerabilities encompass several recurring categories identified across contemporary research:

Vulnerability	Context/Trigger	Unique Impact in Agentic Systems
Direct Database/Tool Access	Autonomous tool/SQL/API interaction	Unauthorized retrieval, data modification, lack of audit (Khan et al., 16 Oct 2024, Wicaksono et al., 5 Sep 2025)
Memory/Knowledge Poisoning	Persistent context/memory usage	Self-validating belief loops, cross-session contamination (Narajala et al., 28 Apr 2025, Zambare et al., 12 Aug 2025)
Logic-layer Prompt Control Injection (LPCI)	Encoded payloads in memory/tool output	Delayed, conditional privilege escalation, trace evasion (Atta et al., 14 Jul 2025, Huang et al., 17 Aug 2025)
Feedback Manipulation	Multi-agent critique/judge feedback	Oscillatory revision, error propagation via adversarial feedback (Ming et al., 3 Jun 2025)
TOCTOU (Time-of-Check to Time-of-Use)	Non-atomic multi-tool/planning steps	Data races, external state manipulation between reasoning steps (Lilienthal et al., 23 Aug 2025)
Privilege Expansion/Delegation Attacks	Autonomous orchestration/workflow execution	Silent escalation, cross-agent impersonation, replay (Goswami, 16 Sep 2025)
Governance Bypass/Obfuscation	Multi-agent/cross-session logs	Auditing difficulties, attribution failures, attack source obscuration (Narajala et al., 28 Apr 2025, Zambare et al., 12 Aug 2025)
Coordination Failures	Multi-agent, distributed reasoning	Collusive failure, mode collapse, propagation of errors (Raza et al., 4 Jun 2025)

Technical examples include:

Prompt injection attacks that bypass static validation and manipulate runtime query generation, leading to malicious database updates or information leaks (Khan et al., 16 Oct 2024);
Poisoning of agentic memory or vector stores with obfuscated, dormant LPCI payloads activated by downstream events or agent roles (Atta et al., 14 Jul 2025, Huang et al., 17 Aug 2025);
Use of adversarial judge feedback to induce oscillatory answer patterns or force correct agents to revise decisions based on persuasive, but false, critiques (Ming et al., 3 Jun 2025);
Exploitation of check–use gaps (TOCTOU) to inject or swap resources after initial validation but before execution (Lilienthal et al., 23 Aug 2025);
Cascading failures through reward hijacking or role drift, where gradual changes in persistent state or feedback modify long-term objective functions without detection (Narajala et al., 28 Apr 2025, Atta et al., 21 Jul 2025).

3. Distinction from Non-Agentic and Model-Level Vulnerabilities

A core property of agentic-only vulnerabilities is their invisibility to model-level evaluation. Results from (Wicaksono et al., 5 Sep 2025) and (Wicaksono et al., 21 Sep 2025) show:

Attack success rates (ASR) for certain adversarial prompts increased by 24–60% in agentic settings involving tool-calling, even when those same prompts failed to elicit unsafe output at the model-only level.
Vulnerabilities specific to agent transfer tools or tool-mediated agent orchestration were not detected by model-centric red teaming, yet represented the highest-risk contexts in agentic assessments.
Iterative context-aware attacks exploit the interleaved nature of agentic workflows, succeeding in compromising harmful objectives that are inert in flat, single-turn model interactions.

This boundary is further reinforced by the persistent, context-driven aspects of agency—e.g., memory poisoning and delayed execution cannot be meaningfully instantiated or observed in stateless completion scenarios (Atta et al., 14 Jul 2025, Zambare et al., 12 Aug 2025). Agentic vulnerabilities also amplify the consequences of privacy breaches, as agents often possess greater operational and data privileges than traditional application logic (Pawelek et al., 23 Sep 2025).

4. Attack Pathways and Threat Models

Agentic-only vulnerabilities are enabled via a variety of carefully characterized adversarial pathways:

Prompt Injection (and LPCI): Malicious actors inject payloads—obfuscated, persistent, or conditional—into agent memory or operational context, which evade immediate filtering but mutate agent logic at a later, contextually triggered point (Khan et al., 16 Oct 2024, Atta et al., 14 Jul 2025, Huang et al., 17 Aug 2025).
Memory Poisoning: Attackers plant erroneous data or exceptions in long-term memory, leading to self-reinforcing belief snapshots or corrupt cached context, which then propagate through chain-of-thought planning or retrieval-augmented generation cycles (Narajala et al., 28 Apr 2025, Zambare et al., 12 Aug 2025).
TOCTOU Exploits: Agents relying on multi-tool plans read state at time t₁, and act at t₂, with the state maliciously changed in the interval, causing actions to be executed on stale or compromised data (Lilienthal et al., 23 Aug 2025).
Feedback-Driven Manipulation: Inter-agent critiques, particularly when combining parametric or retrieval-augmented evidence, induce agents to switch from correct to incorrect answers or escalate behavior along a trajectory of persuasive adversarial feedback (Ming et al., 3 Jun 2025).
Privilege Expansion and Delegation Hijacking: Orchestrated agent frameworks may misinterpret user intent or lose the binding between intent and permissible action, especially in the presence of prompt injection or agent impersonation, resulting in silent scope escalation (Goswami, 16 Sep 2025).

Across these pathways, attacker observability and architectural complexity (e.g., multi-step tool sequences, memory layer sharing, feedback-based correction) multiply the number and subtlety of potential exploit vectors (Li et al., 12 Feb 2025, Wicaksono et al., 5 Sep 2025).

5. Mitigation Strategies and Security Frameworks

Multiple papers propose specialized defense frameworks to address agentic-only risks, emphasizing the need for layered, adaptive, and context-aware controls beyond traditional ML security baselines:

Input and Output Validation: Advanced sanitization, prompt whitelisting (e.g., LLMZ+ (Pawelek et al., 23 Sep 2025)), and two-step validation (input and output) reduce, but do not eliminate, prompt injection and logic manipulation risk (Khan et al., 16 Oct 2024, Atta et al., 14 Jul 2025).
Persistent Memory Integrity: Memory isolation, tamper-evident logging (with timestamp and cryptographic hash chaining), and memory-aware validation for recalled context are required to detect and block dormant or cross-session payloads (Atta et al., 14 Jul 2025, Zambare et al., 12 Aug 2025).
Active, Contextual Monitoring: Continuous runtime observability frameworks (e.g., AgentSeer (Wicaksono et al., 5 Sep 2025)), audit trails reconstructed from agent action graphs, and anomaly detection in reasoning chains address the dynamic nature of agentic exploitation.
Cryptographic, Identity-Based Controls: Agentic JWT (Goswami, 16 Sep 2025) introduces per-agent checksums, chained delegation, and proof-of-possession keys for scopes, dramatically restricting replay and impersonation attack surfaces in API-heavy workflows.
Firewall and Zero-Trust Architectures: Centralized filtering (GenAI Security Firewall (Bahadur et al., 10 Jun 2025)), Zero-Trust IAM with DIDs and VCs, ephemeral Just-in-Time runtime environments, and causal chain auditing provide multi-layered defense and bounded probability of LPCI or cross-component privilege escalation (Huang et al., 17 Aug 2025).
Cognitive Resilience Frameworks: QSAF (Atta et al., 21 Jul 2025) offers runtime controls for detecting and mitigating cognitive degradation—such as memory starvation and planner drift—that introduce silent agentic-only failure paths, reflecting the importance of lifecycle-aware, proactive stability management.
Plan Validation and Atomicity Enforcement: Techniques such as state integrity monitoring, prompt rewriting for query atomicity, and tool fusing (merging check–use operations) directly lower TOCTOU attack rates (Lilienthal et al., 23 Aug 2025).

Security frameworks consistently highlight the need for continuous adaptation: static guardrails, rule-based detection alone, and post-hoc auditing are systematically insufficient in dynamic, agentic settings, as demonstrated by the weak correlation between simple lexical metrics (e.g., n-gram diversity) and real-world vulnerability exposure (Barua et al., 23 Feb 2025, Ming et al., 3 Jun 2025).

6. Quantitative Evidence and Impact

Empirical evaluation verifies the differential risk profile of agentic-only vulnerabilities:

Attack success rates increase by 24–60% for tool-calling contexts in agentic deployments compared to non-tool contexts, with “agent transfer” tools consistently emerging as highest-risk (Wicaksono et al., 5 Sep 2025, Wicaksono et al., 21 Sep 2025).
Prompt whitelisting mechanisms (LLMZ+) achieve false positive and false negative rates of zero with large models and tuned thresholds in experiments, providing strong mitigation against jailbreak and injection attacks (Pawelek et al., 23 Sep 2025).
In controlled red teaming, agentic iterative attacks achieve up to 57% ASR (human-in-the-loop) in agentic loops versus 39.47% at model level; some objectives are only compromised at the agentic tier (Wicaksono et al., 5 Sep 2025, Wicaksono et al., 21 Sep 2025).
TOCTOU countermeasures—prompt rewriting, state monitoring, and tool fusing—jointly reduce vulnerable trajectories from 12% to 8%, and shrink the median attack window by 95% (Lilienthal et al., 23 Aug 2025).
Cognitive degradation detection (QSAF) identifies early internal drift and automatically restores agentic function before systemic collapse, mapping mitigation controls directly to lifecycle stages (Atta et al., 21 Jul 2025).

These results demonstrate that agentic-only vulnerabilities are neither rare nor subtle; they represent systemic, reproducible risk pathways with significant operational implications for both data confidentiality and autonomous system integrity.

7. Research Directions and Open Challenges

Despite advances in threat modeling, architecture, and runtime controls, several open challenges remain:

Scalability and Explainability: Deployment of granular monitoring (e.g., action graphs, causal auditing) at enterprise scale must not introduce prohibitive performance or interpretability overhead (Wicaksono et al., 5 Sep 2025, Huang et al., 17 Aug 2025).
Role-specific Training and Feedback Robustness: Feedback-based multi-agent workflows require specialized, adversary-aware training and role segregation to prevent systematic bias or error propagation (Ming et al., 3 Jun 2025).
Mitigation of Sophisticated In-Memory Attacks: Detection and response to LPCI and memory poisoning require deeper integration of cryptographic attestation and behavioral fingerprinting, with provable security bounds (Atta et al., 14 Jul 2025, Huang et al., 17 Aug 2025).
Continuous Adaptation and Governance: Defense-in-depth must be integrated with ModelOps practices (versioning, drift detection, continuous integration) and aligned with evolving regulatory frameworks (Raza et al., 4 Jun 2025, Narajala et al., 28 Apr 2025).
Zero-Trust End-to-End: The shift to zero-trust agentic environments, involving DIDs, agent-centric cryptography, and per-action attestation, demands both standardized protocols (e.g., Agentic JWT (Goswami, 16 Sep 2025)) and practical guidance for layered deployment across diverse business and regulatory domains.

A plausible implication is that future agentic systems will require runtime, context-sensitive security postures that evolve in tandem with the agent’s state and operational scope, with static and monolithic controls rendered increasingly obsolete by the operational complexity of real-world agentic workflows.

In summary, agentic-only vulnerabilities constitute a critical and expanding class of risks inherent to the structure and operation of autonomous AI systems. Addressing them requires both deep architectural changes—such as persistent memory integrity, autonomous runtime validation, and cryptographic delegation—and proactive, context-aware security frameworks capable of auditing and intervening throughout the entire agent lifecycle. This paradigm shift is essential to realize agentic AI safely within sensitive, high-stakes operational environments.