Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 149 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Disaggregated Hazel Merkle Tree (HMT) for NVMe-oF

Updated 28 October 2025
  • Disaggregated HMT is an advanced integrity structure that offloads cryptographic operations from the host to storage nodes using layered branching and asynchronous updates.
  • It integrates with NVMe-oF by leveraging existing metadata slots to manage cryptographic and tree state without modifying the protocol.
  • The design ensures PB-scale data integrity, freshness, and crash protection with minimal performance overhead, making it ideal for secure, confidential computing environments.

The Disaggregated Hazel Merkle Tree (HMT) is an advanced integrity-and-freshness structure specifically adapted for large-scale, disaggregated storage in NVMe-over-Fabrics (NVMe-oF) environments. HMT diverges from traditional Merkle tree schemes by offloading tree management from the host to storage nodes, employing layered branching and asynchronous updates, and leveraging NVMe metadata for cryptographic operations. This design enables PB-scale data integrity, cryptographic freshness, and crash protection at a minimal performance cost, thus advancing secure storage under confidential computing (CC) paradigms (Chrapek et al., 21 Oct 2025).

1. Conceptual Foundation

HMT implements a "disaggregated" Merkle tree architecture, wherein most of the integrity and freshness responsibilities are shifted from host memory—traditionally requiring terabytes for PB-scale storage—to storage nodes (JBOD/JBOF) and control path accelerators. Unlike conventional trees with uniform branching, HMT uses two branching factors:

  • At the leaf level, leaves are constructed by batching sector IVs, exploiting the grouping inherent in NVMe metadata sectors.
  • Higher-level tree internal nodes employ a separate branching factor, making the structure highly compact and efficient for massive datasets.

Eventual consistency is a central paradigm: tree updates are performed asynchronously and outside the critical data path, reducing latency for writes. Crash protection is provided by maintaining both "old" and "new" IVs in persistent and trusted memory, ensuring the tree state remains recoverable after system failures.

2. Integration with NVMe-over-Fabrics

The integration of HMT with NVMe-oF is achieved without modifying the underlying protocol. Instead, sNVMe-oF encapsulates cryptographic and tree-related metadata using existing NVMe metadata slots:

  • Each SSD sector comprises a payload region (ciphertext) and a metadata region (containing IVs, integrity hashes, cryptographic key information, and freshness-related state).
  • HMT links groups of sectors using metadata sectors and leverages SSD metadata cache for expedited freshness checks, minimizing network round-trips.
  • The management and update of the HMT reside on the storage node (JBOF) and leverage control path communication to CC-capable SmartNICs, offloading verification and update logic and reducing host burden.

3. Security Guarantees

HMT delivers three primary security properties, reflected in its operational scheme:

  • Confidentiality: Every write is paired with a unique IV, allocated via counter-leasing and stored in the metadata, enabling secure AEAD encryption with semantic security for each sector.
  • Integrity: Each IV (or batched sector group) is linked to its parent node in the tree, with verification performed by reconstructing the parent hash hierarchy. Any modification without authorization results in failed integrity checks.
  • Freshness: Sector updates are bound to the HMT, preventing replay or stale data attacks. Asynchronous updates and crash-protected states via persistent storage ensure that post-crash verification remains reliable. All cryptographic instructions and metadata operations are dispatched within trusted execution environments and rely on NVMe metadata layouts, eliminating the need for protocol changes or redundant transport-level protections such as IPSec.

4. Data Path Performance Optimization

Performance considerations are integral to HMT:

  • NVMe metadata aggregation enables efficient storage of IVs and hashes, reducing overhead compared to repeated transport encryption.
  • Dual branching factors at leaf and internal levels facilitate compact in-memory tree representation, shrinking the memory overhead from terabytes to gigabytes for PB-scale arrays.
  • Asynchronous (eventual consistency) tree updates are scheduled after writes, buffering status/location/IV tuples and processing integrity propagation in the background.
  • Tree update and verification tasks are offloaded to SmartNIC accelerators (e.g., NVIDIA BlueField-3 DPU), allowing cryptographic operations to execute at line rate in parallel, substantially reducing CPU usage and network latencies.
  • Benchmarks indicate less than 1–2% performance penalty compared with native NVMe-oF, even as IOPS and latency remain competitive.

5. Technical Implementation Details

HMT's technical realization comprises several innovations:

  • NVMe sector division couples encrypted data blocks with metadata sectors (e.g., storing S=340 IVs for 4096B sectors per metadata region), forming batched leaf nodes.
  • The tree employs parameters Dd=SD_d = S, where leaf fans match the batch size, and DtD_t for internals, ensuring scalable compaction of the memory-resident portion for billions of sectors.
  • Each write uses an AEAD cipher (e.g., AES-GCM), deriving keys through HMAC and computing integrity tags:

C,Hi=E(k,sector numberAD,IV,plaintext)C, H_i = E(k, \text{sector number}_{AD}, \text{IV}, \text{plaintext})

Post-write, IV increments track the number of encrypted blocks.

  • Write operations log tuples [status, location, old IV, new IV], passing these to hashers running on SmartNICs. They propagate integrity changes up to the HMT root asynchronously, decoupling verification from user I/O.
  • Crash protection leverages small persistent buffers (e.g., non-volatile SRAM) plus NVMe metadata, ensuring rapid root recovery and durable update logs.
  • DOCA framework is employed for tightly coupled DPU-buffer registration, batching requests, and minimizing cryptographic latency via doorbell mechanisms.
  • Cryptographic primitives include BLAKE3 for hashing and AES-GCM for encryption, selected for compatibility and hardware acceleration suitability.

6. Experimental Evaluation

Empirical validation was conducted using SSD arrays on NVIDIA BlueField-3 DPUs:

  • Benchmark outcomes establish that the HMT imposes 1–2% overhead for random/sequential workloads compared to baseline, while CPU usage is substantially curtailed via SmartNIC offloading.
  • Resource utilization is minimized, with integrity metadata occupying only 1.76% of total storage versus terabytes required by conventional trees at PB scale.
  • Operational throughput sustains millions of IOPS with only marginal latency increases, attributed to deferred consistency and metadata-driven verification.
  • Comparative assessment against dm-x and classical integrity trees demonstrates superior scalability, lower CPU demand, and less performance impact (2–3%) under PB-scale deployment scenarios. This suggests HMT's dual optimization strategy (disaggregation and event-driven updates) achieves secure, verifiable, and crash-tolerant storage at scale without degrading performance or resource efficiency.

7. Significance and Application Scope

The Hazel Merkle Tree constitutes a pivotal advance for confidential computing in disaggregated storage infrastructure:

  • It enables NVMe-oF data centers to uphold end-to-end guarantees of integrity, confidentiality, and freshness for PB-scale volumes.
  • By encapsulating cryptographic operations and tree state within existing NVMe metadata layouts, it avoids disruptive protocol modification and redundant controls.
  • Offloading verification tasks to hardware accelerators ensures the scheme is sustainable at line rate, making it practical for real-time AI training, cloud data services, and regulated enterprise storage.
  • Crash-tolerant design and memory compaction techniques position HMT as an essential mechanism for next-generation secure storage with full support for asynchronous, distributed operations.

A plausible implication is that further evolutions of HMT may integrate more tightly with emerging storage standards and cryptographic acceleration frameworks, extending its utility to broader classes of confidential computing workloads.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Disaggregated Hazel Merkle Tree (HMT).