Agent Security is a Systems Problem

Published 18 May 2026 in cs.CR and cs.AI | (2605.18991v1)

Abstract: We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted component, and security invariants must be enforced at the system level. Through this lens, efforts to increase model robustness (the dominant viewpoint in the community) are insufficient on their own. Instead, we must complement existing efforts with techniques from the systems security domain. Based on our experience as cybersecurity researchers in operating systems, networks, formal methods, and adversarial machine learning, we articulate a set of core principles, grounded in decades of systems security research, that provide a foundation for designing agentic systems with predictable guarantees. As evidence, we analyze eleven representative real-world attacks on agents and discuss how systems principles, if realized, could have prevented these attacks. We also identify the research challenges that stand in the way of implementing these principles in agents.

Abstract PDF Upgrade to Chat

Authors (14)

Summary

The paper demonstrates that agent security is fundamentally a systems problem needing principles like least privilege and secure information flow.
It analyzes 11 real-world attacks to show that model-centric defenses easily fail against dynamically evolving agentic behaviors.
The study advocates for system-level mechanisms such as formal instruction/data separation, dynamic sandboxing, and deterministic policy enforcement.

Agent Security as a Systems Problem: An Expert Analysis

Systems-Level Perspective on Agent Security

The paper posits that the prevailing model-centric paradigm, which focuses on improving the robustness and alignment of the AI model itself, is fundamentally insufficient for securing agentic systems. The core argument is that the AI model should be conceptualized as an untrusted component, echoing the treatment of processes in traditional operating system security. As such, security invariants must be enforced at the broader system level, leveraging classic principles from systems security. This systems lens is justified by demonstrated limitations of adversarial robustness strategies—persistent attackers continually bypass model-based defenses with minimal additional effort, as seen both in classical adversarial ML and in contemporary LLM deployments.

Agentic systems, which integrate LLMs with tool interoperability and complex environmental interactions, further amplify the attack surface. The paper underscores that adversarial control over input channels (prompt injection, malicious tool outputs) can precipitate unintended agent actions and information exfiltration. The complexity of agentic architectures and their dynamic, open-ended task execution render them uniquely vulnerable; static privilege assignment and rigid policies are not viable. Thus, security policies must be context-sensitive, dynamically generated, and anchored outside the ML model.

Distilled Security Principles

The authors distill systems security insights into five foundational principles for agentic systems:

Least Privilege: Each agentic component must operate with minimal access rights, enforced by structured sandboxing.
TCB Tamper Resistance: The Trusted Computing Base must be protected from adversarial modification, achievable via hardware roots of trust and immutable system layers.
Complete Mediation: Every request crossing the security boundary must be scrutinized by a reference monitor against the prevailing security policy.
Secure Information Flow: Enforcement mechanisms must guarantee that sensitive data does not leak via direct or covert channels to unauthorized endpoints.
Human Weak Link: Operational security should account for human-induced vulnerabilities—misconfiguration, erroneous permission granting, and implementation errors.

Each principle is mapped onto real-world agentic attack case studies, demonstrating multi-principle violations as the root cause rather than singular defense breakdowns.

Analysis of Recent Agentic Attacks

The paper systematically analyzes eleven representative attacks (e.g., ChatGPT SpAIware, Claude Code Exfiltration, Devin AI Secret Leaks, Microsoft Copilot Exfiltration), revealing that failures invariably stem from gaps in system-level enforcement. For example, ChatGPT's SpAIware attack involved persistent prompt injection via the Memories feature, violating least privilege, secure information flow, and TCB tamper resistance. In the Claude Code attack, inadequate shell-command mediation allowed exfiltration through DNS queries. These failures highlight the inability of contemporary model-centric defenses to address dynamic privilege assignment and granular information flow tracking.

A key point is the semantic gap: agentic systems lack layered abstraction boundaries typical in classic OS and network architectures. Agents collapse high-level user prompts into low-level tool invocations without clear interfaces for enforcing policies, making deterministic and context-sensitive enforcement challenging.

Operational Security Mechanisms and Challenges

The paper advocates three core system-level mechanisms, drawing from mature systems security practice:

Provable Separation of Instructions and Data

Language-based models inherently intermix instructions and data in a token stream, making prompt injection attacks nearly trivial. While explicit tagging and separation heuristics have been attempted, adaptive attacks consistently defeat these measures. The authors conjecture that model-only defenses for instruction/data separation will always remain vulnerable; hence, system-level separation (analogous to hardware NX bits, SQL taint tracking) is mandatory, but the unique dynamics of agentic workflows (e.g., learning from data-originated instructions) pose unsolved challenges, especially in multi-modal settings.

Least-Privilege Sandboxing with Verifiable Policy Generation

Sandboxing restricts the agent's authority, but agentic task flexibility requires policies to be specified and reasoned about in natural language, evolving with the execution trajectory. This dynamicity renders deterministic enforcement difficult. Translating natural-language privilege specifications into formal, verifiable policies remains an open research problem; current best practices rely on probabilistic LLM-based policy predictors, which cannot guarantee correctness and may themselves be vulnerable to adversarial manipulation.

Information Flow Control (IFC)

Static or dynamic IFC label propagation, central to OS and programming language security, is extremely challenging in agentic contexts. LLMs process mixed tokens with no intrinsic structure, generating outputs labeled with the union of input labels—causing "label explosion." Mitigation strategies include quantitative information flow (distributional labeling), causal-interventional analysis, and leveraging mechanistic model interpretations, but none are mature for practical deployment. Downgrading sensitive information (declassification/endorsement) and granular label assignment (token-, tool-, or context-level) also remain unsolved.

Addressing Model-Centric Objections

The paper rebuts model-centric arguments by highlighting:

Intent Ambiguity: Model robustness cannot resolve ambiguous user intent and contextually-dependent harmful instructions.
Non-Deterministic Fragility: LLMs exhibit stochastic behavior even at zero temperature, making them unreliable as reference monitors.
ML Model Composition Is Insufficient: Stacking ML-based guard models aggregates statistical failure modes, failing to achieve independent defense-in-depth. Attacks (universal prompt injection, training-data poisoning) that bypass the agent frequently bypass these monitors as well.

Heterogeneous system-enforced barriers (sandboxing, instruction/data separation, deterministic IFC) force attackers to circumvent fundamentally distinct defense mechanisms, achieving true defense-in-depth.

Research Directions and Implications

The paper identifies open problems central to realizing secure agentic deployments:

Formal Instruction/Data Separation: Achieving provable separation (across modalities), without sacrificing agent adaptability.
Policy Generation and Verification: Developing automated pipelines that convert evolving natural-language policies into deterministic, formally verifiable privilege specifications, bridging the semantic gap.
Granular Information Flow Tracking: Designing practical label structures and propagation logics for LLMs, avoiding label explosion and enabling scalable IFC enforcement.
Downgrading and Human-in-the-Loop Enforcement: Mechanisms for controlled declassification and user-centric overrides that minimize approvals while preserving least privilege.

The practical implication is clear: system-level security mechanisms are mandatory for agentic systems. Theoretical progress will be necessary to construct abstraction layers, formal translation logics, and compositional security architectures that afford strong guarantees comparable with traditional systems. Deployment architectures must enable deterministic reference monitors, mutable and verifiable privilege boundaries, and robust, persistent IFC.

Conclusion

This paper frames the security of agentic AI systems as a systems-engineering domain, asserting that reliance on model-centric defenses is fundamentally inadequate. Agentic security requires the adaptation and integration of systems security principles: provable instruction/data separation, verifiable least-privilege policy enforcement, and robust information-flow control. The analyzed attacks and articulated mechanisms underscore that effective defense demands heterogeneous, system-level enforcement. The outlined research challenges necessitate foundational advances in formal policy translation, abstraction layering, IFC logic, and agent dynamism. Theoretical progress and practical adoption of these principles will be critical for trustworthy agentic system deployment (2605.18991).