- The paper introduces a secure edge deployment method that partitions LLM pipelines into isolated cVMs to protect sensitive assets like model weights and prompts.
- It leverages Arm CCA's hardware-enforced isolation, using confidential shared memory and attestation mechanisms to secure multi-party interactions on untrusted devices.
- Empirical results demonstrate low latency overhead (<5.15%) compared to traditional deployments, validating the practical feasibility of confidential edge LLM systems.
AgenTEE: Confidential LLM Agent Execution on Edge Devices
Motivation and Problem Statement
The proliferation of LLM-based agentic systems has accentuated security challenges due to their deep integration with user data, non-deterministic models, and third-party service APIs. While common practice involves cloud-based LLM agent operation, there is an increasing shift toward edge-device deployments, motivated by reduced latency and privacy-centric architectures. However, the adversarial surface of edge-deployed agentic pipelines is significantly more complex: user-owned devices are heterogeneous, may be untrusted or easily compromised, and must mediate interactions among mutually distrustful stakeholders such as model vendors, agent developers, and application providers.
Threats in this regime are multi-faceted. Proprietary assets—including system prompts, agent code, model weights, and sensitive runtime state (e.g., the KV cache)—must be protected against exfiltration and unauthorized modification. Third-party app integrations further complicate isolation, especially for confidential credentials and business logic. Traditional software-based isolation on commodity OSes is demonstrably inadequate as it cannot protect data-in-use from privileged adversaries or the untrusted (and often large) platform TCB.
LLM Agents: Assets and Isolation
LLM agent pipelines integrate the LLM, agent runtime, and third-party applications/services in a workflow where user-initiated requests and untrusted inputs are mixed with sensitive initial conditions such as the system prompt, long-term context, and proprietary orchestration logic.
Figure 1: High-level architecture of an LLM agent workflow combining the LLM core, runtime, and tool integrations.
The attack surface spans all layers:
- Agent code and prompt templates encode sensitive orchestration logic which, if leaked or modified, exposes the pipeline to targeted prompt injection, indirect exfiltration, and control-flow manipulation.
- Inference engines hold proprietary model weights and dynamic runtime state (notably the transient KV cache used for efficient autoregressive decoding). Both have been demonstrated to be exfiltration and integrity targets, with runtime state potentially leaking prompt fragments or user instructions via cache manipulation or side channels.
- Third-party applications handle credentials, tokens, and external APIs which, if unprotected, may lead to cascading security failures.
Arm CCA: Architectural Foundation
Existing TEEs, including Arm TrustZone and Intel TDX, have limited suitability for complex multi-component and mutually distrustful agent deployments due to static partitioning, memory constraints, and a broad TCB. Arm's Confidential Compute Architecture (CCA), in contrast, introduces "realms"—confidential virtual machines (cVMs)—which execute in a hardware-isolated enclave on Armv9-A platforms.
Figure 2: Arm CCA splits device resources between the normal world (conventional OS and apps) and multiple cVMs (realms) with memory and CPU isolation enforced by hardware and the Realm Management Monitor (RMM).
Distinctive features of CCA are:
- General-purpose cVMs supporting unmodified software stacks, dynamically instantiated, and sized according to instantaneous RAM availability.
- Hardware isolation from the OS and hypervisor, reducing the TCB to the RMM, which is minimal and verifiable in contrast to commodity OS/hypervisor codebases.
- Remote attestation mechanisms for identity and code integrity, enabling cryptographically secure provisioning of secrets and assets to only authorized components.
- Extensible design, with already demonstrated support for confidential inter-realm communication channels and secure accelerator access for practical deployment considerations.
AgenTEE Architecture and Pipeline
AgenTEE leverages Arm CCA to enforce strong, hardware-level isolation of agentic system components on edge devices. The architecture decomposes the workflow into three or more independently attested and isolated cVMs: (1) the agent runtime (including prompt construction and orchestration rules), (2) the LLM inference engine (with model parameters and runtime state), and (3) third-party application workers. All interactions between these cVMs are routed through mutually authenticated, OS-inaccessible confidential shared memory (CSM) segments.
Figure 3: AgenTEE arranges agent runtime, LLM worker, and third-party services in separate cVMs connected via confidential attested channels over CAEC-enabled CSM.
The pipeline proceeds as follows:
- Each stakeholder launches their cVM, leveraging CCA attestation to ensure measurement and runtime stacks authenticity.
- Secure TLS channels are established for provisioning proprietary artifacts only after attestation passes.
- Confidential communication across cVMs (using CSM) ensures that no cVM or the hosting normal world can observe the exchanged data or compromise the protocol.
- The normal world (OS) merely provides hardware resource scheduling and mediates UI/event forwarding, without access to cVM internals.
Security analysis establishes that such an architecture blocks exfiltration and tampering attacks at the hardware boundary: neither the OS, hypervisor, nor other untrusted cVM tenants can compromise session state, model, prompt, or orchestration logic as long as CCA's guarantees hold.
Implementation and Empirical Results
AgenTEE is implemented atop OpenCCA, a prototype platform emulating Arm CCA on Radxa Rock 5B hardware. The stack builds on Trusted Firmware-A and patches kvmtool-cca and linux-cca for full CSM functionality. To demonstrate real-world applicability, the authors instantiate two classes of agent workloads—conversational (chatbot) and itinerary planners—using modern LLMs (e.g., GPT2-Medium, Llama-3.2-1B) and the LangChain framework.
Strong empirical results are provided for latency overhead—a dominant constraint for practical edge execution. Despite hardware isolation, measured end-to-end overhead compared to classic multi-process (native OS) deployment is <5.15%, and versus non-confidential VMs, <2.53%. These figures support the assertion that confidential multi-component agentic pipelines with mutual distrust are feasible without significant sacrifice in inference or user-perceived responsiveness.
Implications and Future Directions
AgenTEE addresses a critical gap in confidential, multi-stakeholder LLM agent deployment at the edge: (1) it enables proprietary model, prompt, and code protection, (2) confines credentials and sensitive runtime state to owner-attested trusted domains, and (3) allows secure, verifiable orchestration across dynamically composed agentic pipelines. The inclusion of CSM with per-realm attestation further enforces data-flow isolation and secure multi-party composition, which prior OS/hypervisor-based approaches could not guarantee.
The approach is extensible to integration with confidential accelerators (e.g., secure GPUs via ongoing CCA extensions) and anticipates richer edge device heterogeneity in the mobile ecosystem. As the attack surface shifts with the increased agentic autonomy, hardware-enforced information flow controls—potentially combined with higher-level information flow control (IFC) and monitoring at the secure channel—will be critical for future AI-on-device deployments.
This work demonstrates the practical viability of such architecture with negligible performance penalties.
Conclusion
AgenTEE provides a robust, scalable operational model for confidential LLM agent deployment on edge devices using Arm CCA. By partitioning sensitive assets and workflows into independently attested and isolated cVMs, interacting via OS-inaccessible confidential channels, it achieves strong confidentiality and integrity guarantees for multi-stakeholder and multi-component agentic systems. The empirical analysis substantiates the practicality of this hardware-rooted approach, positioning it as a foundational method for confidential edge AI deployments.