Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents (2410.02644v1)

Published 3 Oct 2024 in cs.CR and cs.AI

Abstract: Although LLM-based agents, powered by LLMs, can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature does not comprehensively evaluate attacks and defenses against LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a comprehensive framework designed to formalize, benchmark, and evaluate the attacks and defenses of LLM-based agents, including 10 scenarios (e.g., e-commerce, autonomous driving, finance), 10 agents targeting the scenarios, over 400 tools, 23 different types of attack/defense methods, and 8 evaluation metrics. Based on ASB, we benchmark 10 prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought backdoor attack, a mixed attack, and 10 corresponding defenses across 13 LLM backbones with nearly 90,000 testing cases in total. Our benchmark results reveal critical vulnerabilities in different stages of agent operation, including system prompt, user prompt handling, tool usage, and memory retrieval, with the highest average attack success rate of 84.30\%, but limited effectiveness shown in current defenses, unveiling important works to be done in terms of agent security for the community. Our code can be found at https://github.com/agiresearch/ASB.

PDF HTML Abstract

Overview of "Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents"

The paper focuses on the increasingly critical area of security in LLM-based agents. As these agents demonstrate capabilities across diverse applications such as e-commerce, autonomous driving, and finance, understanding their vulnerabilities becomes essential. This paper presents the Agent Security Bench (ASB), a comprehensive framework designed to evaluate and benchmark both the attacks on and defenses of LLM-based agents. The framework is robust, integrating various scenarios and tools to provide a broad assessment platform.

Key Contributions

Agent Security Bench (ASB): The paper introduces ASB as the first extensive benchmarking framework specifically for LLM-based agents. The bench includes:
- 10 scenarios spanning several domains (e.g., finance, autonomous driving).
- 10 agents tailored to these scenarios.
- Over 400 tools.
- 23 methods to attack/defend.
- 8 evaluation metrics.
Attack and Defense Evaluation: The framework is used to evaluate 10 types of prompt injection attacks, a memory poisoning attack, a novel Plan-of-Thought (PoT) backdoor attack, and a mixed composite attack. Alongside, 10 defenses are benchmarked.
Critical Findings: Vulnerabilities were uncovered in agent operations across stages like prompt handling, tool usage, and memory retrieval, with an alarming highest average attack success rate of 84.30%. The paper highlights the insufficiency of current defenses.

Methodology

Scenarios and Tools: ASB was designed to emulate realistic environments by incorporating diverse scenarios and various tools pertinent to each scenario. These scenarios help in assessing the adaptability and security robustness of the agents under different conditions.
Attack Framework: The framework defines distinct attack vectors:
- Direct Prompt Injection (DPI): Alters user input directly.
- Observation Prompt Injection (OPI): Modifies the response or observations from tools.
- Memory Poisoning: Compromises long-term memory of agents.
- Plan-of-Thought Backdoor: Installs hidden instructions to trigger under specific conditions.
Defense Strategies: The benchmark includes preventive measures such as data sanitization and detection-based systems designed to identify malicious inputs and actions. The paper assesses these defenses against different types of attacks, specifying metrics like attack success rate under defense (ASR-d) and refusal rate (RR).

Implications and Future Directions

The findings indicate that while LLM-based agents possess powerful capabilities, their security needs substantial bolstering. Identifying and patching these vulnerabilities is crucial to prevent exploitation in real-world applications. The insights gained from ASB can guide the development of robust security measures tailored to the unique vulnerabilities of LLM-based agents. Future research could focus on evolving more sophisticated defense mechanisms and expanding the attack scenarios to cover emerging threats.

Conclusion

This paper offers a comprehensive exploration into the security challenges facing LLM-based agents. It sets a precedent for future security benchmarks and highlights the urgent need for enhanced protective strategies. The research underscores the delicate balance between expanding LLM capabilities and ensuring their secure deployment, contributing significant value to the AI security community.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Hanrong Zhang (7 papers)
Jingyuan Huang (9 papers)
Kai Mei (30 papers)
Yifei Yao (9 papers)
Zhenting Wang (41 papers)
Chenlu Zhan (9 papers)
Hongwei Wang (150 papers)
Yongfeng Zhang (163 papers)

Citations (1)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - agiresearch/ASB: Agent Security Bench (ASB) (4 stars)