Verifiable and Confidential DNN Inference on Low-End Edge Devices

Published 5 Jun 2026 in cs.CR | (2606.07470v1)

Abstract: Deploying deep neural network (DNN) inference on low-end edge devices raises two key challenges: protecting model confidentiality against a potentially compromised edge system and enabling verifiable inference without incurring prohibitive overhead. Existing approaches either house partial models and inference software within trusted execution environments (TEEs), resulting in high cost and an application-dependent trusted computing base (TCB), or execute in untrusted environments, providing little security. In this work, we present VECODI, a framework for verifiable and confidential DNN inference on constrained edge devices. At its core, VECODI introduces SHANGRI-LA, a new execution abstraction on TrustZone-M TEEs that establishes a third runtime environment with privileges strictly between the Secure and Non-Secure Worlds. VECODI leverages SHANGRI-LA to execute untrusted inference code in the Non-Secure World while using minimal application-agnostic Secure-World support to protect model confidentiality and enable verifiability (with respect to proper execution of inference code and model parameters) of inference results. We realize VECODI on a real-world NUCLEO-L552ZE-Q development board and open-source its prototype. Our results show VECODI's small TCB, memory footprint, and runtime overhead, making it a practical option for secure inference in low-end edge devices.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a novel, hardware-enforced abstraction (Shangri-La) that secures confidential and verifiable DNN inference on low-end edge devices.
It achieves a 95.64% reduction in Secure World TCB and only adds a 0.83 ms latency overhead, demonstrating high efficiency in resource-constrained environments.
The protocol uses cryptographic proofs bound to device identity and model parameters, ensuring both model privacy and result authenticity.

VeCoDI: Confidential and Verifiable DNN Inference on Resource-Constrained Edge Devices

Motivation and Problem Formulation

As DNN inference proliferates on edge devices—particularly M-class microcontrollers (MCUs) lacking virtualization support and running minimal operating systems—the imperative for model confidentiality and verifiable execution intensifies. Model providers must secure intellectual property and the integrity of inference results even as these systems face adversaries with full device control and physical access. Conventional deployments, which leverage trusted execution environments (TEEs) either incurring prohibitive trusted computing base (TCB) bloat or leaving inference outside the TEE, inadequately address combined requirements of confidentiality, verifiability, and efficiency. Existing TEE-shielded DNN partitioning (TSDP) techniques and centralized attestation-based solutions do not robustly defend against model extraction or spoofed inference on constrained platforms.

Design: The Shangri-La Abstraction

VeCoDI introduces Shangri-La, a novel execution abstraction leveraging ARM TrustZone-M. Unlike the standard TrustZone-M privilege split (Normal and Secure Worlds), Shangri-La realizes a third, intermediate runtime environment. Its design allows:

Execution of untrusted code (e.g., DNN inference, sensor acquisition) at Non-Secure World privilege, but shielded from the rest of the Non-Secure World with hardware-enforced memory remapping.
Minimal TCB in Secure World, containing only reusable, application-agnostic code, avoiding inference logic in the TEE.
Fine-grained access control and proof-of-execution support, with a protocol that prevents unauthorized inference invocations and generates cryptographically secure proofs for remote attestation.

The Shangri-La lifecycle encompasses distinct instantiation and execution states, ensuring protected handling of confidential model parameters and robust isolation from any untrusted Normal World code. State transitions (provisioning, creation, execution, destruction) are orchestrated through Secure World APIs.

(Figure 1)

Figure 1: Shangri-La’s lifecycle states and TrustZone-M memory transitions, enabling isolated code/data instantiation and atomic, privileged execution without duplicating inference logic in Secure World.

Mechanism: Secure Protocol and Proof-of-Execution

VeCoDI's protocol involves four stakeholders—model provider (Pvd), edge device (Dev), model consumer (Csm), and verifier (Vrf)—supporting:

Provisioning, installing inference code and confidential/public model partitions to Dev.
Authorization, ensuring Csm obtains explicit, usage-bound permission for model access, limiting the exposure to model extraction via query attacks.
Inference execution within Shangri-La, where an input acquisition and inference sequence is performed atomically. The result is cryptographically bound to the input, code, hardware, and consumer identity.
Proof verification externally, confirming that the result is legitimate, timely, and was derived from the specified model and input, without exposing raw input data.

Shangri-La’s Secure World APIs enforce correct state transitions and protect against direct/indirect leakage and replay or forgery attacks.

Security Analysis

Strong security claims are substantiated:

Model Confidentiality: Hardware isolation and encrypted-at-rest private model parameters prevent adversary access outside authorized execution. Direct leakage is blocked by runtime and at-rest isolation; indirect leakage (model extraction) is mitigated via explicit usage limits enforced in Secure World, bound to cryptographically authenticated user requests.
Verifiable Inference: Cryptographic proofs are bound to device identity, code hash, input, and output. The proof-of-execution protocol ensures that results are authentic, model-specific, and untampered, covering both authenticity and privacy constraints.
Minimal TCB: The Secure World TCB is decoupled from use-case-specific inference logic. In empirical analysis, VeCoDI reduces the Secure World TCB to only 4.36% of the baseline (1523 Lines of Code vs. 34,865 LOC).

Empirical Evaluation

A comprehensive prototype is realized on the NUCLEO-L552ZE-Q development board (ARM Cortex-M33), running ResNet inference using the CMSIS-NN library:

TCB Reduction: VeCoDI achieves a 95.64% reduction in Secure World TCB compared to the prior secure-inference baseline, since inference code/libraries reside outside the TEE.
Memory Efficiency: Only the private portion of the model is ever duplicated (encrypted in flash, decrypted into protected RAM), and the memory used is Normal World RAM—leaving scarce Secure World RAM largely untouched.
Low Runtime Overhead: The end-to-end inference latency overhead introduced by VeCoDI (with proof generation/authentication) is 0.83 ms, or 0.07% over the baseline, demonstrating practical deployability.
Robust Security: Directed negative tests confirm prevention of unauthorized memory access (hardware faults on direct attempts), replay/freshness enforcement, and model-identity binding.

Comparison and Relation to Prior Work

In contrast to prior TrustZone-M and TSDP-based solutions, VeCoDI alone satisfies all the following in a unified design:

Direct model leakage protection (G1-1)
Inference authorization/query limiting (G1-2)
Proof of execution and result authenticity (G2-1)
Input privacy-preserving external verification (G2-2)
Minimal and decoupled Secure World TCB (G3-1)
Optimized memory footprint (G3-2)
Negligible runtime latency (G3-3)

By realizing a flexible third-world abstraction, VeCoDI generalizes beyond DNN inference and is applicable to other trusted, privileged computations on constrained TrustZone-M devices, which lack features (e.g., virtual memory, process boundaries) of higher-end ARM TrustZone-A, Intel SGX, or ARM CCA.

Implications and Future Directions

VeCoDI's architecture demonstrates a scalable blueprint for deploying confidential and verifiable analytics on IoT and embedded devices with extremely limited resources, without requiring trust in complex inference libraries or drivers. The composability of protocol elements and independence from model architecture ensure applicability to a broad array of TinyML and security-sensitive edge applications. Practical avenues for refining the architecture include:

Extending Shangri-La to efficiently support batch or interrupt-driven inference patterns.
Exploring integration with emerging microNPUs and accelerator cores, leveraging the privilege model to securely harness hardware acceleration.
Systematic formal verification of the generic Secure World APIs, given their independence from application logic.
Extending proof-of-execution to broader settings: e.g., federated learning updates, confidential actuation, or attested sensor pipelines.

Conclusion

VeCoDI establishes a rigorous approach for deploying confidential and verifiable DNN inference on low-end TrustZone-M platforms. By leveraging the Shangri-La runtime abstraction, it disaggregates inference logic from the Secure World, sharply minimizes TCB, and supplies both cryptographic result attestation and robust model confidentiality. This renders secure, privacy-preserving edge intelligence feasible on commodity MCUs, portending significant enhancements in trustworthy IoT and embedded AI deployments.

Markdown Report Issue