Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning (2406.09187v1)

Published 13 Jun 2024 in cs.LG

Abstract: The rapid advancement of LLMs has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the first LLM agent as a guardrail to other LLM agents. Specifically, GuardAgent oversees a target LLM agent by checking whether its inputs/outputs satisfy a set of given guard requests defined by the users. GuardAgent comprises two steps: 1) creating a task plan by analyzing the provided guard requests, and 2) generating guardrail code based on the task plan and executing the code by calling APIs or using external engines. In both steps, an LLM is utilized as the core reasoning component, supplemented by in-context demonstrations retrieved from a memory module. Such knowledge-enabled reasoning allows GuardAgent to understand various textual guard requests and accurately "translate" them into executable code that provides reliable guardrails. Furthermore, GuardAgent is equipped with an extendable toolbox containing functions and APIs and requires no additional LLM training, which underscores its generalization capabilities and low operational overhead. Additionally, we propose two novel benchmarks: an EICU-AC benchmark for assessing privacy-related access control for healthcare agents and a Mind2Web-SC benchmark for safety evaluation for web agents. We show the effectiveness of GuardAgent on these two benchmarks with 98.7% and 90.0% accuracy in moderating invalid inputs and outputs for the two types of agents, respectively. We also show that GuardAgent is able to define novel functions in adaption to emergent LLM agents and guard requests, which underscores its strong generalization capabilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Zhen Xiang (42 papers)
  2. Linzhi Zheng (3 papers)
  3. Yanjie Li (45 papers)
  4. Junyuan Hong (31 papers)
  5. Qinbin Li (25 papers)
  6. Han Xie (21 papers)
  7. Jiawei Zhang (529 papers)
  8. Zidi Xiong (11 papers)
  9. Chulin Xie (27 papers)
  10. Carl Yang (130 papers)
  11. Dawn Song (229 papers)
  12. Bo Li (1107 papers)
Citations (5)