Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges (2510.23883v1)

Published 27 Oct 2025 in cs.AI

Abstract: Agentic AI systems powered by LLMs and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their ability to autonomously execute tasks across web, software, and physical environments creates new and amplified security risks, distinct from both traditional AI safety and conventional software security. This survey outlines a taxonomy of threats specific to agentic AI, reviews recent benchmarks and evaluation methodologies, and discusses defense strategies from both technical and governance perspectives. We synthesize current research and highlight open challenges, aiming to support the development of secure-by-design agent systems.

Summary

The paper introduces a taxonomy of threats for agentic AI, categorizing risks from prompt injection to multi-agent vulnerabilities.
It details defensive measures including prompt filtering, policy enforcement, and sandboxing to mitigate adversarial attacks.
The study emphasizes the need for new evaluation benchmarks and long-horizon security strategies to address emerging open challenges.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

The paper "Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges" (2510.23883) explores the emerging security landscape associated with agentic AI systems powered by LLMs. These AI agents, characterized by autonomy, goal-directed reasoning, and environmental interaction capabilities, represent a paradigm shift in AI applications across various domains. The paper discusses the potential threats, defense mechanisms, and evaluation standards pertinent to these systems, while also identifying open challenges for future exploration.

Security Threats in Agentic AI

Agentic AI systems, with their intricate abilities like planning and autonomy, inherently introduce unique security vulnerabilities. The paper provides a taxonomy of threats that can jeopardize these systems, classified into key categories such as prompt injection, autonomous exploitation, multi-agent vulnerabilities, and governance concerns.

Prompt Injection and Exploits

Prompt injection remains a critical security issue, where adversaries feed malicious inputs to manipulate agent behaviors (Figure 1). This can be direct, through inserted malicious instructions, or indirect via external data manipulation utilized by an agent. The risk extends to inadvertent injections through poorly specified prompts or contextual drift (Figure 2), highlighting a need for robust input validation and context management.

Figure 3: Taxonomy of Agentic AI Security Threats.

Autonomous Exploitation and Multi-Agent Protocol Risks

The integration of LLM agents with systems that enable code execution or network interactions poses severe risks, enabling potential autonomous cyber-attacks. Multi-agent systems, due to their inherent communication and protocol interactions, are susceptible to attacks like message tampering and spoofing (Figure 4), which can destabilize entire agent networks and workflows.

Figure 4: Visualizing different protocol-level attacks for multi-agent systems: (a) MCP-induced, (b) A2A-induced.

Defense Mechanisms

To mitigate these threats, a comprehensive defense strategy is necessary, encompassing detection, policy enforcement, and runtime monitoring.

Prompt Injection Defense

Effective defenses against prompt injection involve enhancing agent robustness through training, implementing system-level checks, and actively filtering inputs. Strategies range from simple prompt augmentations to complex agent-focused fine-tuning methods tailored to recognize and ignore adversarial inputs.

Figure 5: Some Defenses Against Prompt Injection Attacks.

Policy and Runtime Protections

Robust policy filtering and enforcement frameworks are crucial in constraining agent actions within predefined safety parameters. Policy enforcement through governance-centric and signal-centric methods ensures that agent capabilities do not lead to unintended actions (Figure 6).

Figure 6: Policy Filtering and Enforcement Defense Strategies.

Isolation via Sandboxing

Sandboxing techniques provide a controlled environment for agents to operate safely, containing any unauthorized code executions or system interactions within predefined boundaries (Figure 7).

Figure 7: An Example of Sandboxing as a Defense Strategy.

Evaluation and Benchmarking

Current agentic AI benchmarks primarily focus on the task-specific capabilities of agentic systems. The shift towards evaluating security and safety of these agents is imperative. Newly developed benchmarks aim to assess an agent's ability to defend against security threats and maintain operational reliability under adversarial conditions.

Open Challenges and Future Directions

The paper identifies several open challenges. These include developing long-horizon security measures that capture potential risks associated with extended agent interactions, enhancing multi-agent system security frameworks to handle emergent protocol threats, and advancing benchmarks to incorporate rigorous safety evaluations under dynamic and complex conditions.

In conclusion, the research underscores the critical need for ongoing development in agentic AI security, emphasizing the importance of designing systems that are aligned with emerging threats and capable of autonomous safe operation in diverse scenarios. Addressing the outlined challenges will be key to leveraging the capabilities of agentic AI while ensuring robust security and governance.