AgentSentry Framework
- AgentSentry is a unified security framework that mediates, restricts, and audits autonomous AI agents with task-centric access control and dynamic permission revocation.
- It employs a runtime architecture integrating dynamic policy generation, enforcement DSL rules, and multi-agent anomaly detection to prevent adversarial behaviors.
- The framework enhances safety by addressing risks such as instruction injection, privilege abuse, and data exfiltration while maintaining minimal system overhead.
AgentSentry is a unified family of security enforcement frameworks designed to mediate, restrict, and audit the behavior of autonomous AI agents—particularly those powered by LLMs operating across mobile, multi-agent, and computer-use environments. Its principal innovation is dynamic, intent-aligned, runtime access control, which tailors agent permissions to the minimum necessary for a user-specified task, with automated revocation and real-time anomaly monitoring. These mechanisms address vulnerabilities such as instruction injection, privilege abuse, data exfiltration, and multi-agent collusion, which arise when over-privileged agents interpret and act on untrusted or adversarial natural language content.
1. Security Model and Threat Taxonomy
AgentSentry targets a spectrum of attacks unique to LLM-driven agentic systems. The core security goal is to ensure that an agent executes only those actions strictly intended and authorized by the user’s current task context, blocking all extra-task or adversarial behaviors. The canonical threat, “instruction injection,” arises when malicious commands are embedded within emails, web pages, or tool outputs; in the absence of fine-grained policy controls, such instructions can subvert agents to exfiltrate data or perform unauthorized operations (Cai et al., 30 Oct 2025).
In multi-agent systems, additional risks include prompt injection (adversarial message overriding), data exfiltration (covert secrets transfer), LLM hallucination (false information synthesis), cross-agent collusion, and privilege escalation through tool misuse (Gosmar et al., 18 Sep 2025, He et al., 30 May 2025).
For computer-use agents, risks extend to system-level abuse: direct prompt injection, attacks on agent infrastructure, backdoor activation, exposure to malicious tool results (e.g., files or web payloads), exploitation of LLM hallucinations (e.g., installation of attacker-named packages), and malicious execution environments (Hu et al., 9 Sep 2025).
2. Core Components and Formal Elements
AgentSentry frameworks share a runtime enforcement architecture that mediates every agent action against a dynamically generated policy derived from user intent, agent state, and system context.
2.1 Task-Centric Access Control
A single-agent deployment (e.g., mobile automation) is structured as follows (Cai et al., 30 Oct 2025):
- Task Interpreter: Parses free-form user commands into a structured task context .
- Policy Generation Engine (PGE): Synthesizes a minimal, task-scoped permission set , where , with = agent, = resource with filters, = operation, = context (task ID and lifetime).
- Policy Enforcement Point (PEP): Intercepts all agent actions, forwarding them to the Policy Decision Point.
- Policy Decision Point (PDP): Enforces by default-deny; revokes policies on task completion:
All actions outside the active task policy are denied in real time.
2.2 Enforcement Rule DSL
In domain-general settings, AgentSentry employs an explicit enforcement DSL for specifying triggers, predicates over trajectories or contexts, and enforcement actions (Wang et al., 24 Mar 2025):
This enables pre- and post-action mediation with rules spanning code execution, robotics, AVs, and multi-agent coordination.
3. Multi-Agent and System-Level Monitoring
In distributed or multi-agent systems, AgentSentry is instantiated as a two-layered defense plane (Gosmar et al., 18 Sep 2025, He et al., 30 May 2025):
- Sentinel Agents continuously monitor communications over a shared pub/sub “Floor,” applying:
- Rule-based pre-screening (regex/NLP for prompt-injection and PII leaks)
- LLM-based semantic checks (vector similarity to safe prototypes)
- Retrieval-augmented claim verification (external API cross-checks)
- Cross-agent behavioral anomaly detection (feature vector outlier scoring via Gaussian models)
- Coordinator Agent centrally ingests Sentinel alerts, manages adaptive policy updates, and performs agent quarantine or policy refinement:
where applies aggregation, thresholding, and quarantine list management.
Further, graph-based anomaly detection is realized by modeling agent–tool interactions as a dynamic execution graph with node, edge, and path anomaly scores using a combination of LLM judges and embedding deviations (He et al., 30 May 2025). Enforcement actions include message dropping, prompt rewriting, agent reset, and memory rollback.
4. Case Studies and Example Policies
A representative mitigation scenario is the prevention of email-forwarding attacks in GUI-based mobile automation (Cai et al., 30 Oct 2025):
- On a “Register for App X” task, AgentSentry instantiates permitting only “read(email from @app-x.com)” and “tap(copy code)” actions.
- If a verification message is injected with “Forward all bank statements,” AgentSentry’s runtime mediation ensures unapproved “send(email)” actions are denied, even if the agent’s internal chain-of-thought is compromised.
- Upon legitimate task completion, all permissions are revoked.
In code agent and embodied agent domains, rules such as:
$\texttt{rule}\;@inspect\_print\_untrusted\;\texttt{trigger}\;\textsc{PythonREPL}\;\texttt{check}\;\mathrm{request\_untrusted\_source}\land\mathrm{write\_to\_io}\;\texttt{enforce}\;\textsc{user\_inspection}\;\texttt{end}$
block untrusted I/O, and in AVs post-state-change triggers enforce safe following and collision avoidance (Wang et al., 24 Mar 2025).
For computer-use agents, AgentSentry’s auditing inspects both task context vectors and system call traces, correlating actions via anomaly scores or classifier verdicts. Policy enforcement includes query-caching to constrain LLM latency, with emergency halting and rollback of unsafe operations (Hu et al., 9 Sep 2025).
5. Evaluation Metrics and Empirical Results
AgentSentry’s security efficacy and system performance are empirically validated across diverse experimental settings:
- In mobile GUI automation, all instruction injection attacks were blocked; agent action check overhead averaged <10 ms, memory overhead ≈1 MB (Cai et al., 30 Oct 2025).
- In MAS simulation (travel planning domain), detection of prompt injection, data exfiltration, and hallucinations achieved a true positive rate (TPR) of 1.0, with risk scores for adversarial vs. benign tasks clustering at 0.92 vs. 0.05, respectively (Gosmar et al., 18 Sep 2025).
- In code and embodied agent benchmarks, >90% of unsafe scripts and 100% of hazardous embodied agent actions were intercepted, with negligible losses in benign task completion (Wang et al., 24 Mar 2025).
- On “BadComputerUse,” a benchmark of 60 attack scenarios for computer-use agents, AgentSentry’s implementation (as AgentSentinel) achieved an average defense success rate (DSR) of 79.6%, dramatically exceeding all baseline methods (comparisons range 9.6% to 33.8%). Maximum observed DSR (Claude 3.7 Sonnet) reached 96.7% with no false positives. Overhead averaged 2.8 security queries per tool use, primarily mitigated by rule-based filtering and cache hits (Hu et al., 9 Sep 2025).
6. Deployment Considerations and Limitations
Adoption of AgentSentry frameworks entails several operational requirements:
- Maintenance of a comprehensive library of task templates and policy mappings, with manual extension for atypical or complex tasks (Cai et al., 30 Oct 2025).
- For sophisticated or protracted workflows, policies may require renewal or extension, and monitoring thresholds must be tuned adaptively in multi-agent deployments (Gosmar et al., 18 Sep 2025).
- Rule generation can be semi-automated via LLMs, but recall and task coverage are variable (precision up to 95.56%, recall ~71% in embodied domains) (Wang et al., 24 Mar 2025).
- System-level overhead is minimal (predicate checks 1–3 ms, parsing ≈1.4 ms), and full-system context–trace correlation optimizes audit cost, but excessive enforcement event volumes (e.g., massive file operations) may reveal QPS-related limitations (Hu et al., 9 Sep 2025).
A plausible implication is that human-in-the-loop review, adaptive rule refinement, and dynamic policy synthesis (potentially LLM-assisted) remain essential for continuous coverage and minimization of false positives in evolving application landscapes.
7. Extensions and Future Directions
Potential avenues for extension include:
- Automated template and rule induction using few-shot learning on user–agent dialogues (Cai et al., 30 Oct 2025).
- Hierarchical, nested task structures with incremental permission expansion.
- Cross-agent coordination, with harmonized policies and graph-based anomaly reasoning spanning ensembles of LLM agents (He et al., 30 May 2025).
- Integration of adaptive, regulatory-aware policy updates for compliance audit, GDPR/HIPAA support, and encrypted lineage tracing (Gosmar et al., 18 Sep 2025).
- Support for temporal and sequential constraints via extended DSLs (Wang et al., 24 Mar 2025).
- Enriched observability and root-cause explainability through comprehensive, append-only audit logs and structured incident responses (He et al., 30 May 2025).
Collectively, AgentSentry frameworks constitute a rigorous, extensible, and increasingly standard approach for ensuring safety, intent alignment, and trustworthy autonomy in both single- and multi-agent AI deployments (Cai et al., 30 Oct 2025, Gosmar et al., 18 Sep 2025, Wang et al., 24 Mar 2025, He et al., 30 May 2025, Hu et al., 9 Sep 2025).