- The paper presents a comprehensive taxonomy classifying applications, threats, and defenses in LLM-driven agents through a survey of over 150 works.
- It demonstrates empirical evaluations across offensive and defensive security, highlighting multi-agent strategies and runtime protection measures.
- The survey identifies critical gaps in robustness and standardization, urging the development of modular, provably secure architectures.
Agentic Security: Taxonomy, Threats, and Defenses in LLM-Driven Agents
Introduction
The surveyed work provides a comprehensive synthesis of the agentic security landscape, focusing on the intersection of LLM agents and cybersecurity. The authors introduce a three-pillar taxonomy—applications, threats, and defenses—structuring over 150 primary works to elucidate the operational roles of LLM agents, their unique vulnerabilities, and the spectrum of countermeasures. The survey emphasizes the shift from passive LLMs to autonomous, tool-using agents, highlighting both the expanded capabilities and the emergent attack surfaces inherent to agentic systems. The analysis is grounded in empirical findings, cross-cutting trends, and a critical evaluation of architectural, operational, and knowledge-level gaps.
Applications of LLM Agents in Security
Offensive Security (Red Teaming)
LLM agents have been operationalized for autonomous penetration testing, vulnerability discovery, fuzzing, and exploit generation. Architectures such as PentestGPT and multi-agent frameworks (e.g., PentestAgent, VulnBot) demonstrate end-to-end automation of reconnaissance, exploitation, and lateral movement, leveraging adaptive planning and feedback mechanisms. Empirical evaluations (e.g., AutoPenBench, AI-Pentest-Benchmark) reveal that multi-agent and planner-executor designs outperform monolithic LLMs in complex, multi-stage attack scenarios. In fuzzing, agentic systems like Locus and ChatAFL employ predicate synthesis and protocol grammar extraction to guide input generation, achieving deeper state coverage and higher bug discovery rates than traditional fuzzers. Exploit generation frameworks (e.g., MalGen, CVE-Genie) illustrate the capacity for polymorphic, environment-aware malware synthesis, with CVE-Genie reproducing 51% of 841 real-world exploits.
Defensive Security (Blue Teaming)
On the defensive axis, LLM agents are deployed for continuous monitoring, threat detection, incident response, threat hunting, and automated remediation. SOC frameworks (e.g., IRCopilot, CORTEX) utilize role-based and collaborative agent models to reduce false positives and automate response playbooks. Threat hunting agents (e.g., ProvSEEK, LLMCloudHunter) integrate RAG, chain-of-thought reasoning, and provenance analysis to improve detection precision and recall, with LLMCloudHunter achieving 83% precision and 99% recall in extracting cloud IoCs. Automated forensics agents (e.g., RepoAudit, CyberSleuth, GALA) leverage memory, causal inference, and graph-augmented reasoning to reconstruct attack chains and generate verifiable reports, reducing triage time by up to 40%. Autonomous patching agents (e.g., RepairAgent) achieve state-of-the-art results on Defects4J, autonomously repairing 164 bugs at a cost of $0.14 per bug.
Domain-Specific and Specialized Applications
Agentic systems are increasingly tailored to cloud, web, IoT, finance, and healthcare domains. Cloud security agents (e.g., KubeIntellect, LLMSecConfig) orchestrate subagents for log analysis, RBAC auditing, and misconfiguration repair. Web and OS security agents (e.g., MAPTA, AIOS, Progent) enforce sandboxed execution, privilege control, and policy compliance, with Progent eliminating attack success in red-team evaluations. In finance and healthcare, agents like LISA and HIPAA-compliant frameworks outperform static analyzers and embed privacy guardrails, respectively.
Threat Landscape in Agentic Systems
Expanded Attack Surface
The agentic context introduces a broader and more severe set of vulnerabilities compared to standalone LLMs. The survey categorizes threats into injection attacks, poisoning/extraction, jailbreaks, agent manipulation, and red-teaming.
- Prompt Injection: Static and predictable system prompts in agents are highly susceptible to both direct and indirect prompt injection. Benchmarks such as AgentDojo and InjecAgent reveal that all evaluated defenses can be bypassed by adaptive attacks, and that security improvements often degrade task utility.
- Poisoning and Extraction: Memory and RAG poisoning attacks (e.g., AgentPoison) can hijack agent behavior via backdoor triggers, with larger models sometimes exhibiting increased susceptibility. Model extraction attacks enable adversaries to clone agent capabilities via repeated API queries.
- Jailbreaks: Agentic wrappers significantly increase vulnerability to jailbreak attacks, with coding agents exhibiting up to 75% attack success in multi-file codebases. Simple jailbreaks designed for chatbots are highly effective against tool-using and browser agents.
- Agent Manipulation: Goal hijacking, action hijacking, and reward hacking attacks exploit the agent’s planning and reasoning modules, often subverting user intent or exploiting reward model ambiguities. Multi-agent systems are further exposed to Byzantine threats, where a single compromised agent can disrupt collective behavior.
- Red-Teaming: Automated adversarial agents can uncover emergent vulnerabilities in multi-agent systems, with frameworks like Agent-in-the-Middle and search-based simulators revealing new classes of taint-style and privacy risks.
Evaluation and Benchmarking
A diverse set of benchmarks (e.g., ASB, RAS-Eval, AgentHarm, SafeArena, JAWS-BENCH) provide standardized environments for adversarial testing. These platforms demonstrate that even state-of-the-art agents achieve low policy-compliant success rates and are highly vulnerable to prompt injection, jailbreak, and privilege escalation attacks. Execution environments such as CVE-Bench and DoomArena enable reproducible, multi-stage attack simulations, revealing persistent gaps in agent robustness.
Defense Mechanisms and Hardening Strategies
Secure-by-Design Architectures
Modular planner-executor isolation, layered verification, and intent validation are effective in reducing cross-context injection rates by over 40%. Polymorphic prompting and governance-oriented frameworks extend these principles, embedding trust calibration and threat modeling into agent design. Information-theoretic approaches (e.g., ModelGuard) constrain knowledge leakage and model extraction.
Multi-Agent Security
Zero-trust and dynamic collaboration paradigms minimize leakage and collusion risks in multi-agent systems. Debate-based collectives and randomized smoothing techniques achieve over 90% phishing detection, while provenance tracking is critical for containing LLM-to-LLM prompt infections.
Runtime Protection and Security Operations
Guardrails such as R²-Guard, AgentGuard, and AGrail combine unsafety prediction with logical reasoning, reducing jailbreak failures by up to 35%. Human-in-the-loop oversight and behavioral anomaly detection (e.g., SentinelAgent) provide interpretable, real-time monitoring. Formal verification systems (e.g., VeriPlan, IRIS) ensure behavioral correctness, while collaborative SOC frameworks (e.g., AutoBnB, CORTEX) improve alert precision and reduce fatigue.
Evaluation Frameworks
Benchmarking platforms (e.g., AgentDojo, Ï„-Bench, TurkingBench) and sandboxed environments (e.g., DoomArena, ToolFuzz, WebArena) enable systematic evaluation of agent robustness and failure modes. Defense testing studies consistently expose the fragility of current defenses under adaptive adversaries, underscoring the need for continuous red-teaming and scalable assurance.
Cross-Cutting Trends and Gaps
- Architectural Shift: The field is moving from monolithic to planner-executor and hybrid architectures, with modularization improving interpretability and debugging.
- Role Stratification: Executor and planner roles dominate, while critics/verifiers and governors/mediators are less common but increasingly important for self-regulation.
- LLM Monoculture: GPT-family models are used in 83% of studies, raising concerns about monoculture and reproducibility. Model-specific alignment differences hinder cross-model generalization.
- Knowledge Sourcing: Static pre-trained knowledge is the norm, with limited adoption of RAG, fine-tuning, or RLHF. This constrains adaptivity and resilience to evolving threats.
- Modality Coverage: Text remains the primary modality, but there is growing interest in logs, code, network traces, and images. RAG poisoning and non-text modalities are under-defended.
- Benchmark Fragmentation: The proliferation of benchmarks and testbeds complicates cross-paper comparison and reproducibility.
Implications and Future Directions
The survey highlights the urgent need for defense techniques with provable safety guarantees, robust cross-domain generalization, and scalable, adaptive evaluation pipelines. The monoculture around GPT-family models and reliance on static knowledge sources present systemic risks, both in terms of security and reproducibility. Future research should prioritize:
- Secure, provenance-verified RAG pipelines and incremental fine-tuning for dynamic threat environments.
- Formal verification and runtime assurance frameworks that scale to multi-agent and cross-domain deployments.
- Economic and operational analyses of agentic security, including cost, speed, and energy considerations.
- Standardization of benchmarks and evaluation protocols to facilitate reproducibility and cross-model comparison.
Conclusion
This survey provides a structured, holistic view of agentic security, mapping the operational landscape, threat vectors, and defense strategies for LLM-driven agents. The analysis reveals both the promise and the fragility of current agentic systems, emphasizing the need for modular, verifiable, and adaptive security architectures. Addressing the identified gaps will be critical for the safe and reliable deployment of autonomous LLM agents in real-world, high-stakes environments.