LLM Agent-Based Attacks
- LLM Agent-Based Attacks are sophisticated multi-stage exploits targeting integrated pipelines by abusing memory, retrieval, and API interfaces.
- They involve techniques such as prompt injection, backdoor insertion, and supply chain attacks, resulting in high success rates and stealthy compromises.
- Empirical evaluations report attack success rates up to 90%, highlighting critical security challenges and the urgent need for robust mitigation strategies.
LLM Agent-Based Attacks refer to the diverse set of security and privacy threats that specifically target LLM-powered agentic systems—pipelines in which a core LLM is augmented with external memory, retrieval, web access, and tool APIs. In contrast to “prompt injection” and jailbreak methods aimed at single LLMs, agent-based attacks exploit the complex, inherently open architectures of LLM agents, impacting each stage of their multi-component pipelines. These attacks can involve data exfiltration, code execution, computational resource hijacking, system takeover, and propagation of self-replicating prompt malware, posing unique and often highly effective risks that do not manifest in isolated LLMs. The agentic design introduces novel attack surfaces, trust boundaries, and vectors—many of which are trivial to exploit but difficult to detect or mitigate with existing methods (Li et al., 12 Feb 2025).
1. Architectural Distinctions and Attack Surfaces
LLM agents, compared to isolated LLMs, integrate a pipeline of functional modules, each representing a discrete attack surface. The pipeline typically consists of:
- Core LLM Engine (e.g., GPT-4, Claude 3.5 Sonnet)
- Memory Modules: persistent storage for prior contexts, credentials, or intermediate states
- Retrieval Modules (RAG): augment context via external/internal document libraries
- Web-Access Tools: enable search, web scraping, file downloads
- Tool or API Interfaces: interface with applications, databases, or third-party APIs
Attackers can compromise the agent at every transition: injection at retrieval or memory lookup, manipulation of observed web content or API outputs, poisoning of stored memories, and interception of inter-agent messages. This rich composition yields not only additive, but multiplicative vulnerability—agents are far easier to attack than standalone LLMs (Li et al., 12 Feb 2025).
Agent Attack Surfaces
| Stage | Entry Point Example | Attack Vector Type |
|---|---|---|
| User Query | Direct prompt injection | Prompt/Jailbreak |
| Retrieval/Memory Lookup | Poisoned RAG/memory entry | Memory/Knowledge Poisoning |
| LLM Reasoning | System prompt corruption, backdoors | Backdoor, PoT, Jailbreak |
| Tool Invocation | Compromised API/web service | Tool/Observation Injection |
| LLM Interpretation of Results | Tool response manipulation | Observation Faults |
| Inter-Agent Messaging | Message hijacking in multi-agent SMA | Communication Attacks |
| Environment Output | Malicious HTML, GUI overlay | Indirect Prompt Injection |
2. Taxonomy of LLM Agent Attacks
A comprehensive classification system covers six axes (Li et al., 12 Feb 2025):
- Threat Actors:
- External (controls web/API content)
- Malicious Users (issues coercive prompts)
- Insider Threat (partial white-box knowledge)
- Objectives:
- Data exfiltration (leaking credentials, PII)
- Real-world/device harm (malicious tool use, malware downloads)
- Scientific weaponization (protocol creation)
- Entry Points:
- Environment (web, API, public datasets)
- Memory Systems (internal/external knowledge stores)
- Tool Interfaces/APIs (browser, email, databases)
- Attacker Observability:
- Black-box (sees only input/output)
- White-box (partial system information)
- Attack Strategies:
- Prompt Injection/Jailbreak
- Backdoor Insertion (parameterized/substrate)
- Memory Poisoning
- Observation/Tool Response Poisoning
- Communication Attacks (e.g., Agent-in-the-Middle, prompt infection)
- Pipeline Vulnerabilities:
- Lack of authentication, insufficient provenance, and agent overtrust in external data
This taxonomy enables formal threat modeling and targeted defense development.
3. Canonical Attack Techniques and Case Studies
Prompt Injection and Observation Injection
Malicious instructions injected via user prompts, retrieved documents, API responses, or GUI overlays can override agent policies, cause tool misuse, or hijack the agent’s internal reasoning. Even primitive jailbreaking—instructing “ignore above rules and ...”—often leads to full agent compromise, especially when injected at tool or environment boundaries (Li et al., 12 Feb 2025).
Backdoor Attacks
Parameter-efficient backdoor attacks demonstrate extreme stealth and robustness in LLM agents. For example, “BadAgent” uses adapter-based fine-tuning (e.g., AdaLoRA, QLoRA) on a fraction of agent data to implant a backdoor: as little as 20% data poisoning suffices for Attack Success Rates (ASR) >90% in web navigation and shopping tasks, with follow-step ratios (stealth) nearly indistinguishable from clean models. These backdoors survive further legitimate fine-tuning—data-centric mitigation is essentially ineffective (Wang et al., 5 Jun 2024).
Indirect and Supply Chain Attacks
Exploiting agent openness, attackers can inject adversarial triggers into web pages (DOM/HTML/GUI overlays) or supply-chain tools. Universal triggers (e.g., hidden aria-labels in accessibility trees) have been shown to induce targeted LLM agent behavior—such as credential theft or forced ad-clicks—at rates exceeding 90% across major websites (Johnson et al., 20 Jul 2025). Supply chain exploitation is also seen in “LeechHijack,” where MCP (Model Context Protocol) tool plug-ins expropriate computational resources via implicit, privilege-compliant latencies, with attack success rates up to 77% and overheads that remain statistically normal (Zhang et al., 2 Dec 2025).
Multi-Agent and Communication Attacks
Intricate sabotage becomes possible in collaborative agent systems. Agent-in-the-Middle (AiTM) attacks intercept and alter inter-agent communications; even with knowledge of only a single agent’s incoming messages, an LLM-powered adversary can induce DoS or targeted payload propagation in >90% of multi-agent topologies. Vulnerability is highest in chain-structured systems; even complete and random graphs exhibit substantial compromise rates (He et al., 20 Feb 2025). “Prompt Infection” further demonstrates worm-like, self-replicating prompt attacks that propagate across agent chains—global, self-replicating infections reach full society saturation in under 11 communication steps for 50-agent populations (Lee et al., 9 Oct 2024).
Mobile and GUI Agent Attacks
Mobile LLM agents and GUI-based agents face additional modalities of attack, including adversarial pop-ups, overlays, deep-link hijacking, and “fine-print” manipulation. Systematic studies found that low-barrier vectors such as ad injection succeed in >80% of cases across top mobile agent frameworks, and that agents are highly susceptible to embedded content attacks that are virtually invisible to human users (Du et al., 31 Oct 2025, Chen et al., 15 Apr 2025, Wu et al., 19 May 2025).
4. Empirical Severity and Evaluation Benchmarks
Research demonstrates that LLM agent vulnerabilities are not theoretical but immediate, severe, and recurring across platforms:
- Attack success rates: Mixed attacks in Agent Security Bench (ASB) reached 84.3%; direct prompt injection attained 72.7%; Plan-of-Thought backdoors, 42.1%; memory poisoning, 7.9%; and observation injection, 27.6% (Zhang et al., 3 Oct 2024).
- Practical hijacking: In “BadAgent” and “DemonAgent,” ASRs for covert, dynamically encrypted multi-backdoor attacks consistently reach ≈100%, with detection rates at 0% even by advanced safety audits (Zhu et al., 18 Feb 2025, Wang et al., 5 Jun 2024).
- Mobile domain: Advanced multi-app agents are reliably compromised by both low-barrier and advanced workflows, with malware sideload/opening success surpassing 90% for leading frameworks (Du et al., 31 Oct 2025).
- Multi-agent propagation: Prompt Infection achieves up to 65.2% ASR for self-replicating, system-wide attacks in pipelines using strong models; infection in agent societies follows logistic growth, saturating in O(log N) steps (Lee et al., 9 Oct 2024).
- Web/GUI agents: Indirect HTML accessibility attacks and fine-print injection in adversarial GUI elements subvert LLM agents at rates far surpassing naive expectations, with little difference between open-source and commercial models (Johnson et al., 20 Jul 2025, Chen et al., 15 Apr 2025).
5. Underlying Causes and Security Implications
The agentic design ethos—modular, composable, with trust at every interface—introduces several critical classes of vulnerabilities:
- Implicit Trust: Agents consume, interpret, and act on data from sources lacking cryptographic integrity or provenance, leading to “confused deputy” scenarios (Zhang et al., 2 Dec 2025).
- Open-ended Actuation: By design, agents can issue real API/tool calls, browse the web, parse GUIs, invoke system actions, or pass messages to other agents; this creates high-leverage vectors for privilege escalation, system manipulation, or lateral movement (Li et al., 12 Feb 2025, Lupinacci et al., 9 Jul 2025).
- Over-permissive Communication: Multi-agent setups rarely validate the origin or content of inter-agent messages; any agent (honest or compromised) can hijack peer behavior (He et al., 20 Feb 2025).
- Ambiguous Provenance: Agents lack rigorous policy to distinguish user-vs-agent-generated instructions, especially when prompts are composed from a mixture of system, user, memory, or environmental input (Lee et al., 9 Oct 2024, Zhang et al., 3 Oct 2024).
- Stealth, Persistence, and Evasion: Modern attacks employ code fragmentation (MBTI), dynamic encryption, or stealthy triggers (e.g., via benign-looking accessibility nodes), remaining statistically invisible to existing defense audits (Zhu et al., 18 Feb 2025, Wang et al., 5 Jun 2024).
- Capability–Vulnerability Tradeoff: More capable, generalist agents (mobile or desktop, multimodal or multi-app) are systematically more susceptible to agent-based attack, even though they outperform in intended tasks (Wu et al., 19 May 2025, Du et al., 31 Oct 2025).
- Defensive Fragility: Empirical benchmarks confirm that current defenses—paraphrasing, input delimiters, instruction hardening—are insufficient; even multi-agent defense pipelines, while effective against prompt injection, add latency and may miss indirect or multistage attacks (Hossain et al., 16 Sep 2025, Zhang et al., 3 Oct 2024).
6. Mitigation Strategies and Open Challenges
Current defense techniques fall into four main categories, all with recognized limitations:
- Pipeline Hardening: Multi-agent guard architectures (chain or hierarchy) effectively eliminate direct prompt injection with 100% mitigation against known categorical attacks, but incur significant latency and are subject to false positives or adaptive adversarial strategies (Hossain et al., 16 Sep 2025).
- Anomaly and Provenance Detection: Systems like TraceAegis mine behavioral and hierarchical execution traces to detect workflow anomalies, performing with F1 > 0.94 on benchmark data, but remain vulnerable to attacks that mimic normal tool-call sequences (Liu et al., 13 Oct 2025).
- Cryptographic Protocols and Authenticated Channels: Ensuring integrity of tool and inter-agent communication via MACs, digital signatures, or attestation passports disrupts some communication- and supply-chain-based attacks, but creates overhead and requires ecosystem support (Zhang et al., 2 Dec 2025, He et al., 20 Feb 2025).
- Input/Output Filtering and Policy Enforcement: Saliency-based parsing for GUI agents, contextual integrity checks, and on-the-fly tool call validation reduce attack surface for certain classes, but adversary adaptation is ongoing, and comprehensive solutions do not yet exist (Chen et al., 15 Apr 2025, Wu et al., 19 May 2025).
Notable Deficiencies:
- Data-centric defenses (re-fine-tuning) are ineffective against backdoors in LLM agents (Wang et al., 5 Jun 2024).
- Standard NLP backdoor detectors miss “thought-level” and observation attacks, which leave final user outputs unaltered (Yang et al., 17 Feb 2024).
- Provenance and behavioral anomaly systems must be combined with sandboxed tool execution and multi-layered cryptographic protocols to defend against emerging, multi-stage, and supply chain attacks.
7. Research Directions and Formalization Challenges
Open problems identified by the state-of-the-art include:
- Holistic verification: The need for certified robust agents with end-to-end formalized guarantees on all I/O channels, memory, and tool calls (Zhang et al., 3 Oct 2024).
- Compositional Attestation: Lightweight, scalable attestation mechanisms for tool/plugin ecosystems (MCP and beyond) (Zhang et al., 2 Dec 2025).
- Adversarial training: Systematic adversarial exposure and simulation (e.g. agentic red-teaming frameworks) to drive down effective attack rates (Zhang et al., 21 Oct 2025).
- Dynamic provenance and runtime anomaly detection: Continuous, trace-level supervision combining both hierarchical and semantic analysis (Liu et al., 13 Oct 2025).
- Multi-agent protocol design: Authentication, watermarking, and protocol compliance checking for agent-agent communication (Lee et al., 9 Oct 2024, He et al., 20 Feb 2025).
- Defensive benchmarking: Emphasis on utility-security tradeoff metrics (UST) and red-teaming benchmarks covering system prompts, memory, and observation poisoning (Zhang et al., 3 Oct 2024).
A plausible implication is that the surge in agentic LLM deployments, without these comprehensive defenses, risks broad exploitation well beyond classical “jailbreak” problems—encompassing persistent compromise, cross-application propagation, and hard-to-detect resource and data exfiltration at scale.
Key References:
- Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks (Li et al., 12 Feb 2025)
- BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents (Wang et al., 5 Jun 2024)
- LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems (Zhang et al., 2 Dec 2025)
- The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover (Lupinacci et al., 9 Jul 2025)
- Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree (Johnson et al., 20 Jul 2025)
- TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection (Liu et al., 13 Oct 2025)
- Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents (Zhang et al., 3 Oct 2024)
- Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems (Lee et al., 9 Oct 2024)
- Imprompter: Tricking LLM Agents into Improper Tool Use (Fu et al., 19 Oct 2024)
- A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks (Hossain et al., 16 Sep 2025)
- From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents (Wu et al., 19 May 2025)
- The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections (Chen et al., 15 Apr 2025)