Monolithic 3D DRAM (Mono3D) Overview

Updated 1 June 2026

Mono3D DRAM is a vertically integrated memory architecture that stacks cells, sense amplifiers, and peripheral circuitry to achieve unprecedented areal density.
The architecture leverages innovative hybrid bonding and selector + strap topology to reduce wiring length, latency, and energy per access by up to 60%.
Mono3D enables advanced system-level features such as near-memory processing and tier-aware data placement, addressing challenges in thermal management and reliability for future scaling.

Monolithic 3D DRAM (Mono3D) is a DRAM architecture in which memory cells, sense amplifiers, and peripheral circuitry are vertically integrated into dense, wafer-scale stacks built using monolithic fabrication techniques and fine-pitch hybrid bonding. By eschewing the traditional 2D DRAM scaling roadmap, Mono3D directly addresses the limitations of photolithographic shrink, routing congestion, and the tradeoff between latency, energy, and die area. The architecture enables unprecedented areal density, greater internal bandwidth, and new physical-design–technology co-optimization opportunities, driving advances in high-performance computing, near-memory processing, and reliability engineering. The following sections detail Mono3D DRAM’s device fundamentals, integration strategies, array architectures, performance characteristics, scaling trends, and reliability considerations.

1. Architectural Principles and Technology Motivation

As DRAM cell sizes approach the 4F² physical scaling limit, planar lithography and etch complexity inhibit further density improvements in 2D layouts. Monolithic 3D DRAM replaces lateral scaling with vertical stacking of memory tiers—cells, bitlines (BLs), and wordlines (WLs)—with direct hybrid bonding of logic and array layers (Lee et al., 12 Mar 2026).

Mono3D fundamentally reorganizes the bitcell and periphery topology so that either peripheral logic is positioned beneath (or, in some flows, above) the memory array, interconnected via BEOL metals or hybrid Cu–Cu bonds at ≤1 μm pitch. This approach enables bit densities beyond 2 Gb/mm², well beyond the ~0.4 Gb/mm² limit of advanced planar (D1b) DRAM, and shrinks critical wiring lengths such that RC parasitics—and thus latency and energy per access—scale down with vertical array height.

Key drivers include:

Density scaling: Up to 2.6 Gb/mm² for 87–137-layer stacks, a ~6× increase over state-of-the-art 2D DRAM (Lee et al., 12 Mar 2026), with research devices reaching 32 GB capacity in 1024-layer configurations (Lu et al., 27 Feb 2026).
Lower RC delays: Shorter BLs and WLs in stacked arrays reduce the parasitic capacitance and resistance, directly lowering $t_{RC}$ and $t_{RCD}$ (Huang et al., 2020).
Improved energy per access: Reduced parasitics and enablement of innovative selector/strap architectures (see below) decrease charge/discharge energy by 60% compared to planar DRAM (Lee et al., 12 Mar 2026).

2. Device Modeling, Array Structures, and Integration

Mono3D DRAM cell design builds on detailed device and parasitic modeling at the TCAD and SPICE levels, incorporating access transistor characteristics, storage capacitance, bitline capacitance and resistance, and complex vertical/lateral routing parasitics (Lee et al., 12 Mar 2026). Two principal integration flows dominate:

Coarse-grained M3D (CG-M3D):
- Top tier: DRAM 1T–1C cell arrays and their local BL/WL interconnects
- Bottom tier: Sense amplifiers, decoders, drivers, I/O, formed using tungsten BEOL for thermal compatibility (Huang et al., 2020)
- Inter-tier connectivity: Monolithic inter-tier vias (MIVs) with dimensions of ~50 nm × 100 nm, parasitic adds $<$ 0.2 fF, $<$ 20 Ω per via
Fine-grained monolithic stacks:
- Up to 137 Si or 87 AOS tiers at 70 nm/40 nm channel thickness, respectively, enabled by BEOL-compatible amorphous oxide (AOS) access transistors and capacitors (Lee et al., 12 Mar 2026, Waqar et al., 29 Jun 2025)
- Direct Cu–Cu hybrid bonding permits 1 μm or finer pitch, compared to through-silicon vias (TSVs) at ~10 μm in HBM3 (Pan et al., 6 Oct 2025, Lu et al., 27 Feb 2026)

A key architectural innovation is the selector + strap topology, where each metal-2 strap bundles 8 BLs and 16 WLs, accessed only via “select” transistors on each tier. A BEOL-compatible AOS selector offers high on/off ratio ( $>10^{11}$ ) and sub-pA leakage, dramatically reducing standby losses and facilitating grouping (Lee et al., 12 Mar 2026, Waqar et al., 29 Jun 2025). The effective BL capacitance drops to ~6.6 fF, and HCB pitch constraints are relaxed to manufacturable ranges (0.62–0.75 μm), overcoming a previous scaling bottleneck.

3. Performance, Density, Energy, and Scaling Metrics

Mono3D enables substantial improvements in canonical DRAM metrics:

Metric	2D DRAM (D1b)	Mono3D (Si, 137L)	Mono3D (AOS, 87L)	M3D-128 (Huang et al., 2020)
Areal density (Gb/mm²)	0.436	2.6	2.6	—
$t_{RC}$ (ns)	21.3	10.9	10.5	9.56% lower than DDR4-512
Write energy (fJ)	15.50	6.26	5.38	—
Read energy (fJ)	3.88	1.57	1.35	—
Die area reduction	—	—	—	~14% (M3D-128 vs DDR4-512)

Performance gains stem from: reduced BL capacitance (6.6 fF vs. 20 fF), shortened WL and dual-end sensing, lower supply overdrives (1.6–1.8 V vs. 2.5 V), and elimination of long metal runs by direct vertical BL–logic connection (Lee et al., 12 Mar 2026).

Energy per access is dominated by

$E \approx \frac{1}{2} C_{BL} V_{DD}^2 + E_{SA}$

With $C_{BL}=6.6$ fF, $V_{DD}=1.6$ V, and further energy-saving logic, Mono3D achieves a 60% reduction in cell-access energy over planar DRAM.

M3D integration also mitigates the classic latency–area tradeoff: moving sense amps and peripheral logic beneath the array allows simultaneous shortening of both local and global BLs. The M3D-128 organization yields 9.56% lower latency, 4.96% lower power, 21.21% lower energy-delay product (EDP), and up to 14% smaller die area than 2D DDR4-512 (Huang et al., 2020).

4. Routing, Hybrid Bonding, and Tiered Organization

Hybrid Cu–Cu bonding (HCB) is fundamental to Mono3D. HCB requires strict pad pitch matching between logic and cell wafers. The bitline-strap plus selector architecture enables practical HCB pad pitches (0.62 μm for AOS, 0.75 μm for Si) and relaxes the otherwise prohibitive need for 0.22–0.26 μm BL-to-BLSA connections (Lee et al., 12 Mar 2026).

Internally, Mono3D achieves internal bandwidths of 19–34 TB/s per stack (cf. 0.8 TB/s external, HBM: ≈4 TB/s), leveraging the fine 1 μm bond pitch—a roughly 10× improvement over TSV-based HBM3 (Pan et al., 6 Oct 2025, Lu et al., 27 Feb 2026). The extremely high internal parallelism (e.g., 16 channels × 64 banks/channel × 1024 bits/bank @ 1 GHz) enables near-memory or in-memory processing (NMP/PIM) architectures.

Tiered vertical structures induce a gradient in access-latency; e.g., in a 1024-tier stack, $t_{RCD}$ spans 2.29 ns (nearest tier) to 22.88 ns (farthest), requiring tiering-aware mapping to hide this latency variance (Lu et al., 27 Feb 2026, Pan et al., 6 Oct 2025). Data placement strategies map latency-critical or bandwidth-demanding data to the lowest-latency tiers.

5. Reliability: Row Hammer, Floating Body, and Device Co-Optimization

Mono3D cell reliability is distinguished from planar DRAM by distinct physical and electrical effects:

Row hammer mitigation: Vertical separation and oxide isolation remove the shared substrate path for lateral charge migration. Simulated row hammer thresholds are 2.6×–21× higher than in 2D DRAM; e.g., storage-node perturbation per aggressor-WL activation is reduced from ~0.5 mV/cycle (2D) to 0.01–0.05 mV/cycle (3D), yielding error thresholds exceeding 14 k toggles (Cho et al., 1 Nov 2025).
Floating body and impact ionization: Floating body effects, exacerbated by impact ionization in the access transistor, increase vertical capacitive coupling and leakage. Body potential rises induce threshold shifts via the classical body effect, amplifying cross-tier disturbance, especially when vertical oxide thickness is minimal. Grounding or reducing body thickness markedly mitigates these effects.

Systematic TCAD sweeps show

Reducing body thickness from 30 nm to 10 nm suppresses body-voltage-induced node drops by >30%.
Lowering channel doping and raising gate work function further reduce leakage and $t_{RCD}$ 0 shifts (Cho et al., 1 Nov 2025).

Optimal robustness requires a three-way co-optimization of body thickness (10–20 nm), channel doping (~ $t_{RCD}$ 1 cm⁻³), and gate work function (~5.0 eV), balancing retention, yield, and row hammer immunity.

6. System-Level Implications and Advanced Applications

The integration of Mono3D DRAM fundamentally alters system architecture and enables new computation paradigms:

Near-memory/in-memory processing: Direct logic–DRAM hybrid bonding and >20 TB/s internal bandwidth permit integration of NMP clusters in the stack base die. Examples include Stratum (tiered MoE inference, 8.29x higher throughput, 7.66x higher energy efficiency versus GPU+HBM) (Pan et al., 6 Oct 2025), and GenDRAM (integrated bioinformatics/graph DP, >22x–68x GPU speedup) (Lu et al., 27 Feb 2026).
Fine-grained partitioning for heterogeneous access: Topic- and tier-aware data placement strategies schedule “hot” (high-access) data into fast-access tiers and cold data into slow tiers, hiding vertical latency variance and maximizing throughput (Pan et al., 6 Oct 2025).
Processing near registers and caches: Monolithic BEOL M3D eDRAM and gain-cell topologies enable multi-ported, high-density register file and cache structures for GPGPU applications with area and standby power headroom (Waqar et al., 29 Jun 2025).

System-level integration still faces challenges in:

Managing thermal hotspots (logic die beneath dense DRAM films)
Maintaining power delivery and IR drop within allowed margins
Addressing cumulative failure sources at large stack sizes (ECC, refresh, redundancy mapping, and refresh addressing for floating-body/row-hammer)

7. Future Scaling Directions and Challenges

Mono3D DRAM’s co-optimized architecture, device, and interconnect stack permits scaling beyond 200 memory tiers (>4 Gb/mm²), subject to several fundamental constraints (Lee et al., 12 Mar 2026):

Selector and material innovation: Sub-40 nm AOS selectors with lower leakage and sharp subthreshold slopes are needed for future strap topologies.
Hybrid bonding advances: Relaxation of HCB pitch (below 0.6 μm) through advanced wafer-to-wafer alignment and fine-pitch lithography is essential for >200-layer stacks.
Thermal management: Stack height and local hotspot limits require further advances in material engineering and potentially chiplet- or partition-based stacking.
Reliability engineering: Device–array–system co-optimization integrating error correction, floating body/row hammer-aware refresh, and dynamic mapping is essential at extreme stack heights.

Mono3D DRAM architectures unify advances in vertical monolithic integration, material systems (AOS/Si), array topology (selector + strap), and hybrid interconnect, and form the enabling substrate for high-bandwidth, low-latency, and energy-efficient memory for accelerated computing and intelligent systems (Lee et al., 12 Mar 2026, Pan et al., 6 Oct 2025, Lu et al., 27 Feb 2026, Cho et al., 1 Nov 2025, Huang et al., 2020, Waqar et al., 29 Jun 2025).