Agent Security Bench (ASB)

Updated 21 August 2025

Agent Security Bench (ASB) is a formal, metric-driven framework for evaluating and benchmarking the security of autonomous agent systems.
It employs comprehensive threat modeling and scenario-driven design to assess vulnerabilities like prompt injection, memory poisoning, and impersonation.
The framework integrates classical mobile agent research with modern LLM-based architectures, offering open-source tools and reproducible evaluation pipelines.

Agent Security Bench (ASB) refers to a formalized, metric-driven framework for evaluating, benchmarking, and improving the security posture of autonomous agent systems—both traditional mobile agents and modern LLM-based agentic architectures. An ASB provides the methodology, scenario coverage, attack/defense formalism, and evaluation pipeline necessary to quantify and stress-test the resilience of agents against a diverse spectrum of security threats including prompt injection, memory poisoning, impersonation, and multi-agent cascading failures. The notion of ASB is realized in both classical mobile agent literature and in contemporary work on LLM-based agents, with each context contributing unique methodological, architectural, and evaluative components.

1. Historical Evolution and Motivation

The origins of ASB lie in the early mobile agent systems literature, where threats targeting agents, platforms, and their various interactions (agent–platform, agent–agent, etc.) necessitated systematic security evaluation (Amro, 2014). Security risks such as masquerading, unauthorized access, DoS, repudiation, code alternation, and eavesdropping were formally classified, and corresponding security objectives—authentication, authorization, confidentiality, accountability, availability—established the foundational criteria for benchmarking agent infrastructures.

With the advent of LLM-based agents, the attack surface expanded considerably. Modern ASBs, such as that described in “Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents” (Zhang et al., 3 Oct 2024), adopt comprehensive scenario-driven design, including a multi-stage LLM agent pipeline, tool usage, memory mechanisms, multi-step planning, and heterogeneous operational environments (e.g., IT, finance, medicine, legal, autonomous vehicles).

The motivation for ASB in both eras is to enable reproducible, quantitative evaluation of agent security—by identifying vulnerabilities, formalizing threat and defense models, and providing actionable, benchmarking results to inform the development and deployment of robust agent systems.

2. Threat Modeling and Attack Taxonomy

ASB frameworks systematically enumerate attack modes against agents, classified according to attack vectors, operational stages, and communication pathways.

Classical Threats in Mobile Agents (Amro, 2014):

Threat Class	Example Attack	Target Relationship
Masquerading	Impersonation of agent/platform	Agent↔Agent, Platform↔Agent
Unauthorized Access	Invocation/modification by other agents	Agent→Agent, Platform→Agent
Denial of Service	Blocking migration, agent deletion	Platform→Agent
Repudiation	Transaction/guilt denial	Any↔Any
Code Alternation	Undetectable code or state modification	Platform→Agent
Eavesdropping	Passive execution monitoring	Platform→Agent

Modern LLM-Agent Threats (Zhang et al., 3 Oct 2024):

Attack Type	Injection Stage	Mechanism
Direct Prompt Injection (DPI)	User prompt	Append malicious instructions to the user query
Observation Prompt Injection (OPI)	Tool I/O	Inject via tool API responses or outputs
Memory Poisoning	Memory (RAG/DB)	Insert adversarial entries into retrieval database
Plan-of-Thought (PoT) Backdoor	Planning	Poison CoT/planning steps to trigger malicious behavior
Mixed Attacks	Multi-stage	Combine multiple methods for compounded attack

ASB thus provides fine-grained adversarial modeling covering each agent’s operational stage, enabling assessment across all meaningful attack surfaces.

3. Evaluation Objectives and Benchmarking Metrics

To rigorously benchmark security, ASB frameworks define specific, measurable security objectives and associated metrics:

Security Objectives (Amro, 2014, Zhang et al., 3 Oct 2024):

Authentication and Authorization: Validation of agent and platform identity and permissions.
Confidentiality, Privacy, and Anonymity: Data and behavioral secrecy via encryption and privacy engineering.
Accountability and Non-repudiation: Provable logging and transaction traceability.
Availability: Resilience against resource starvation and DoS.

Example Metrics and Benchmarking Approaches (Zhang et al., 3 Oct 2024):

Metric	Formalization / Purpose
Attack Success Rate (ASR)	$\mathrm{E}_q[\mathbb{I}(Agent(\mathrm{LLM}(p_\mathrm{sys}, q \oplus x^e, \ldots)) = a_m )]$
Refuse Rate (RR)	Agent's propensity to reject malicious input
Benign Performance (BP)	% correct actions w/o backdoors
Performance under No Attack (PNA)	Baseline utility score absent adversary
Utility–Security delta	$\|BP - PNA\|$

These metrics are instantiated in large-scale tests, e.g., 90,000 agent episodes across 13 LLM backbones, to quantify both penetrability and the utility–security trade-off. Defense strategies are scored by their impact on ASR as well as on benign agent performance.

4. Attack and Defense Methodologies

ASB encompasses both the generation of attacks and the evaluation of corresponding defenses.

Adversarial Generation:
- DPI/OPI: Programmatic synthesis of adversarial queries and tool responses.
- Memory/Plan Poisoning: Manipulation of RAG entries and example chains-of-thought.
Defensive Strategies:
- Delimiter usage, instruction filtering.
- Prompt paraphrasing and validation.
- Instructional prevention (explicitly forbidding tool execution).
- Perplexity-based anomaly detection for memory retrieval.
- Shuffling planning steps to break backdoor triggers.

Defense efficacy is benchmarked by measuring post-attack success rates and tracking utility degradation.

5. Scenario and Agent Diversity

Unlike prior monolithic or single-domain benchmarks, ASB emphasizes broad scenario coverage, reflecting real-world heterogeneity in agent operations. The canonical ASB (Zhang et al., 3 Oct 2024) includes dedicated agents and tools for:

IT management
E-commerce
Legal, financial, and medical consultation
Autonomous vehicles and aerospace engineering
Research and academic advisory
Counseling and psychology

Each agent operates in a domain-specific operational context with relevant tool APIs and workflows. Attack and defense are formalized for these realistic settings, establishing the generality and extensibility of the ASB methodology.

6. Practical Implementation and Open Resources

The ASB framework is designed for reproducibility and extensibility:

Open source code and data: Complete implementation—including agent and tool templates, attack/defense scenario scripts, and evaluation harnesses—are released at [https://github.com/agiresearch/ASB].
Integration instructions: Guidelines are provided for incorporating novel LLM backbones, custom tool APIs, and new security scenarios.
Evaluation pipeline: Automated scripts enable batch execution of attacks, defense application, and quantitative performance extraction.

This infrastructure supports both research and operational red-teaming, lowering the barrier to adoption for academic and industrial security auditing of LLM-based agents.

7. Limitations and Future Directions

Empirical benchmarking via ASB highlights several open challenges:

Many contemporary defense techniques, while reducing ASR by modest margins, remain only partially effective; paraphrasing and delimiter defenses frequently leave attack success rates above 50% in DPI scenarios.
The most capable LLMs, absent high refusal rates, are often the most vulnerable—indicating that increased planning or reasoning sophistication does not inherently confer resistance to prompt-based compromise.
The ASB paper (Zhang et al., 3 Oct 2024) calls for further research on intrinsic architectural defenses, such as more robust prompt design, advanced retrieval and data sanitization protocols, and detection strategies that maintain low false positive rates.

The ASB codebase and architecture are designed to accommodate evolving attacker methodologies and the integration of increasingly realistic operational contexts. This extensibility is specifically intended to adapt to future advances in LLM technology and security research.

In summary, Agent Security Bench (ASB) systems formalize the end-to-end lifecycle of attacks and defenses in agentic systems. Through scenario diversity, fine-grained attack/defense modeling, large-scale empirical evaluation, and reproducible open-source tooling, ASB enables the rigorous and systematic benchmarking of agent robustness, providing actionable diagnostics to inform both theoretical understanding and the engineering of secure autonomous agents (Amro, 2014, Zhang et al., 3 Oct 2024).