Papers
Topics
Authors
Recent
Search
2000 character limit reached

PIM-Enclave: Secure In-Memory Processing

Updated 30 January 2026
  • PIM-Enclave is a secure computation framework that leverages Processing-in-Memory to provide confidential, integrity-guaranteed execution directly within memory subsystems.
  • It combines a hardware–software co-design embedding enclave execution in 3D-stacked memory with a cryptographic protocol using MPC and AES-GCM to protect against diverse attacks.
  • The framework achieves robust security with minimal overhead, making it well-suited for data-intensive applications like machine learning inference and large-scale analytics.

PIM-Enclave is a secure computation framework designed to enable confidential and integrity-guaranteed processing directly inside memory subsystems using Processing-in-Memory (PIM) architectures. It targets modern cloud and data-center settings, where data-intensive workloads demand both high throughput and robust protection against adversaries capable of snooping buses, DRAM, PIM modules, and host software. Two prominent instantiations define the landscape: a holistic hardware–software co-design embedding an enclave execution environment at the memory-bank level (Duy et al., 2021), and a cryptographically anchored MPC-based protocol for secure offload to untrusted PIM modules (Ghinani et al., 28 Jan 2025).

1. Architectural Foundations

PIM-Enclave as a secure, in-memory execution model spans two principal architectural domains. The first, introduced in (Duy et al., 2021), utilizes 3D-stacked memory modules where the “logic layer” beneath DRAM banks hosts PIM cores, dedicated local memory, secure key storage, AES-GCM–capable DMA engines, and access-control logic. Each bank becomes a semi-autonomous execution enclave: kernel code and data remain inside the memory module, absent from CPU caches and external buses. All communication—commands and DMA parameters—travels encrypted via MMIO and session keys established by remote attestation.

The second instantiation (Ghinani et al., 28 Jan 2025) overlays cryptographic protocols atop commercial PIM hardware (e.g., UPMEM DPUs). Here, a Trusted Execution Environment (TEE) on the CPU splits input data into shares, encrypts them, and lets one share propagate to the untrusted PIM, while the TEE maintains the other and performs lightweight local computation. Integration with MPC primitives (arithmetic secret sharing, Yao’s garbled circuits) enables secure computation even when the offloaded kernel enters untrusted memory or execution logic.

Subsystem In-memory enclave (Duy et al., 2021) Cryptographic offload (Ghinani et al., 28 Jan 2025)
Memory Bank PIM core + AES-DMA + local RAM UPMEM DPU executes on encrypted shares
Host Role Initiates secure attestation, offloads TEE splits, encrypts, synthesizes shares
Security Boundary Logic layer, DRAM trusted Only TEE, CPU caches, buses trusted

2. Security Model and Threat Mitigation

PIM-Enclave’s threat models are defined rigorously in both works. The hardware-level approach (Duy et al., 2021) trusts only the PIM module (logic layer, DRAM array), while the host (CPU, OS/hypervisor) is considered malicious. Adversaries can manipulate MMIO packets, probe DIMM pins, and observe or tamper with bus transactions. The cryptographic approach (Ghinani et al., 28 Jan 2025) restricts the Trusted Computing Base (TCB) to the TEE, CPU core, and on-chip caches. All DRAM, buses, and PIM logic are adversarial and may eavesdrop, replay, or inject faults.

Key security strategies include (i) remote attestation and secure channel setup, (ii) full-cut address and data side-channel elimination (I(S; O) = 0 over bus traces), (iii) content-change defense via locked DRAM region, AES-GCM tag authentication, and (iv) computation exclusively inside isolated, cacheless PIM logic or under masked data representations. In the cryptographic protocol, confidentiality is assured by counter-mode encryption and secret sharing, while arithmetic MACs and circuit checks guard correctness and integrity. No location or update pattern escapes into the host-system view once data resides in the protected bank or is split and encrypted before PIM offload.

3. Programming Models and Workflow

Developers interact with the PIM-Enclave in two phases. In the hardware co-design (Duy et al., 2021), the programmer supplies both a host-enclave stub and a bare-metal PIM kernel. The host:

  • Initializes the PIM enclave,
  • Performs remote attestation and session key establishment,
  • Loads encrypted kernel binaries,
  • Allocates encrypted in-bank buffers,
  • Activates address-range locks,
  • Kicks off execution, and
  • Retrieves results post computation.

Sample code segments demonstrate allocation, encrypted load, region protection, and invocation. Within the PIM bank, kernel execution reads encrypted data batches, operates over local memory, and writes back output with on-the-fly AES encryption.

For the cryptographic protocol (Ghinani et al., 28 Jan 2025), the workflow includes precomputation phases where the TEE generates one-time pads and precomputes matrix-multiplication shares offline, storing them for rapid online stage recombination. Runtime operations mask private data, offload encrypted shares to PIM, and merge returned results in the TEE, switching protocols for nonlinear computations. All data transferred to PIM remains masked or garbled; code is structured so only the TEE ever handles plaintext or interprets intermediate outputs.

4. Cryptographic Techniques and Optimizations

The protocol-centric PIM-Enclave employs three principal cryptographic constructs:

  • Counter-mode encryption: Each block PiP_i masked via unique nonce and key, yielding Ci=PiRiC_i = P_i \oplus R_i, with only the TEE holding RiR_i.
  • Arithmetic secret sharing: Any secret xx is replaced by x1x_1 (PIM share) and x2x_2 (TEE share), with computation proceeding independently and being securely merged.
  • Yao’s garbled circuits: Boolean computations for nonlinear functions (e.g., ReLU, sigmoid) converted to garbled gates, evaluated obliviously by the PIM as directed by the TEE.

Precomputation optimization shifts heavyweight CPU-side linear algebra entirely offline, amortizing the cost across multiple online invocations. For deep learning models, this approach eliminates up to 92% of CPU-side GEMV overhead and compresses online latency to just the recombination stage (<8%<8\% of runtime).

5. Performance Evaluation

Both instantiations provide rigorous quantitative analyses (Duy et al., 2021, Ghinani et al., 28 Jan 2025). Hardware-level PIM-Enclave demonstrates that AES-DMA incurs approximately one cycle per 16 bytes, translating to a 22.3% average DMA latency penalty over unencrypted PIM, and a peak bandwidth drop from 3.53 GB/s to 2.90 GB/s. Critically, secure k-means clustering over 640 MB data adds only 3.7% total overhead versus plain PIM, and offers up to 2× speedup relative to host-only execution (≥6 banks).

On UPMEM hardware, the cryptographic PIM-Enclave achieves up to 14.66× speedup over fully secure CPU baselines, incurring only ~4% overhead for security compared to insecure PIM frameworks. Deep learning inference (MLP, DLRM), classical ML tasks, and memory-centric regression all show strong performance improvement, especially when offline precomputation is feasible and data movement bottlenecks are eliminated.

6. Workload Suitability and Limitations

PIM-Enclave is particularly beneficial for:

  • Large-scale analytics (map-reduce, clustering, sorting),
  • Machine learning primitives (distance calculation, matrix/vector ops, deep recommendation model inference),
  • Graph algorithms with irregular access patterns (pointer chasing).

It leverages internal TSV-based bandwidth and parallel bank execution to outperform host DRAM throughput by factors exceeding 10× for supported tasks. The programming model favors SPMD-like applications where banks operate independently, coordinated through the host.

Limitations include lack of protection against side-channel leaks within the CPU and PIM device (timing, power, EM), amortization dependence on static public data for precomputation, reduced host DRAM capacity during bank protection, and diminishing returns if model sizes or data patterns preclude efficient secret sharing or offline computation. Verification for nonlinear computations must be staged, with explicit MAC and garbled circuit checks; adversary scope excludes collusion and implicit side-channel inference.

7. Comparative Impact and Directions

PIM-Enclave as documented in (Duy et al., 2021) and (Ghinani et al., 28 Jan 2025) extends the confidential-computing paradigm beyond processor-centric enclaves (SGX, TEE) to the memory subsystem and PIM-enabled DRAM. It eliminates memory bus side-channel leakages, mathematically bounds adversary observability (I(S;O)=0I(S; O)=0), and enables highly efficient, scalable secure offload for data analytics and ML inference. The approach builds on advances in secure near-data processing, MPC, and hardware-software co-design, presenting a practical, low-cost path for cloud environments prioritizing both throughput and confidentiality.

While future work remains to bridge gaps in side-channel defenses within PIM logic and to optimize cross-bank communication and coordination, the framework strategically balances modest encryption overheads (20%\approx 20\%) for uncompromised privacy and integrity of large-scale workloads. This suggests a promising trajectory for secure, high-bandwidth cloud computation that natively leverages memory-centric architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PIM-Enclave.