3D Integrated SRAM-eDRAM
- Monolithic 3D SRAM-eDRAM is a vertically integrated memory architecture that combines SRAM’s low latency with eDRAM’s high density and energy efficiency.
- The design leverages BEOL-compatible low-temperature processes and fine-pitch monolithic inter-tier vias to achieve reduced area and improved thermal management.
- This integration supports compute-in-memory operations by reducing interconnect parasitics and enhancing performance-per-watt, addressing scaling challenges in advanced nodes.
Monolithic 3D SRAM-eDRAM refers to a class of embedded memory architectures that vertically integrate static random-access memory (SRAM) and embedded dynamic random-access memory (eDRAM) tiers on a single substrate using monolithic three-dimensional (3D) integration. This approach exploits the density, low-leakage, and high bandwidth potential of eDRAM with the low latency and robust performance of SRAM, leveraging fine-pitch monolithic inter-tier vias (MIVs) to achieve dense, energy-efficient, and thermally manageable memory-on-memory structures suitable for compute-in-memory (CIM) operations. The technology addresses scaling limitations of planar SRAM, especially in advanced process nodes, and is directly instrumental in extending the capabilities of conventional memories for data-intensive and AI-accelerated workloads (Chakraborty et al., 15 Apr 2026, Waqar et al., 29 Jun 2025).
1. Monolithic 3D Stack Architecture
A monolithic 3D SRAM-eDRAM system comprises at least two vertically stacked tiers fabricated at the wafer scale with BEOL-compatible temperature constraints (typically ≤ 400 °C). In a representative implementation using GlobalFoundries 22 nm FDSOI, the lower tier consists of 3T/9T eDRAM arrays with ~200 nm thickness, while the upper tier comprises 6T or 8T SRAM arrays with ~250 nm thickness. Vertical connectivity between corresponding bit-cells in the upper (SRAM) and lower (eDRAM) tiers is provided by dense MIVs (e.g., 50 nm diameter, 200 nm length, 100 nm pitch, and ~1×10⁸ cm⁻² via density), which support full cross-bar coupling for both read→write (R→W) and write→read (W→R) paths (Chakraborty et al., 15 Apr 2026).
In alternative paradigms, such as BEOL integration of amorphous oxide semiconductor (AOS) eDRAM upon a FinFET logic substrate, 1T1C, 2T0C, and 3T0C gain-cell topologies are deployed in successively stacked oxide-transistor device tiers to provide persistent, high-density alternatives to SRAM. Vertical MIVs (60–100 nm pitch, <20 fF/parasitic connect) link these BEOL devices to the active silicon base (Waqar et al., 29 Jun 2025).
2. Fabrication Flow and Materials Compatibility
The fabrication sequence is thermally engineered to preserve the retention and reliability of memory devices in both tiers:
- The bottom (eDRAM) tier is realized by completing the FDSOI front-end processing and additional compute transistors, followed by inter-tier dielectric (SiO₂/SiN) deposition.
- High-aspect-ratio MIVs are formed by etching through the dielectric and tungsten refill at ≤ 400 °C.
- Top-tier (SRAM, or AOS-based memory) patterning utilizes low-temperature FEOL or BEOL processes (all steps ≤ 400 °C) to avoid excessive thermal exposure.
- In AOS-based eDRAM, W-doped In₂O₃ (IWO), IGZO, or similar materials are deposited and patterned at 250–350 °C (Waqar et al., 29 Jun 2025, Chakraborty et al., 15 Apr 2026).
Thermal budgets are tightly constrained since temperatures above 400 °C can degrade both FEOL devices and underlying memory retentivity. The one-dimensional vertical thermal resistance network, with for layer , governs stack-level temperature rise and places limits on stack height, necessitating efficient heat-spreading solutions (top heat-spreader or microfluidic coldplate) in dense 3D assemblies (Chakraborty et al., 15 Apr 2026).
3. Electrical Characteristics and Memory Performance
SRAM and eDRAM tiers exhibit distinct signal, access, and retention properties that dictate their roles in hybrid CIM arrays:
- Bitline capacitance () includes both per-cell and wire parasitics: , where fF (SRAM), fF/μm.
- Access time is set by RC delay, , where accounts for combined pull-down and sense amplifier impedances.
- Energy and delay: SRAM read energy per bit fJ/bit, access time ps (0 fJ·ns); eDRAM read energy per bit 1 fJ/bit, access time 2 ps (3 fJ·ns). 3D stacking induces ~15% reductions in interconnect parasitics and thereby lowers both energy and latency (Chakraborty et al., 15 Apr 2026).
- On-chip memory bandwidth scales with array width and clock: 4, supporting e.g., 128 bits/cycle at 500 MHz, or 5 64 Gb/s for a 4-bit cross-array.
In monolithic 3D Oxide-based banks, AOS gain-cells (1R1W/3R1W) achieve up to 0.76× the area of SRAM, support multi-port operation, and maintain 6 ps at 1 GHz, matching or exceeding the aggregate bandwidth of equivalent SRAM banks (Waqar et al., 29 Jun 2025).
4. Density, Area, and Peripheral Sharing
A definitive advantage of monolithic 3D SRAM-eDRAM is the reduction in effective memory cell area per bit, attributed to vertical stacking and peripheral circuit sharing in the BEOL region:
- The area of a hybrid memory macro declines by ~30% compared to 2D implementations, as both tiers utilize common decoders, sense amplifiers, and drivers. For AOS BEOL memories, 2T/3T gain-cells deliver 24–25% lower area than SRAM at the same node, with 7 versus 8 in advanced nodes (ASAP7 at 7 nm) (Waqar et al., 29 Jun 2025, Chakraborty et al., 15 Apr 2026).
Multi-tier stacking allows for memory expansion (e.g., 2T0C-IBC achieving 6.1× memory density relative to baseline SRAM L2 at equal capacity), while maintaining critical wordline and bitline parasitics by partitioning into banked arrays. A plausible implication is that aggressive stacking along with increased porting enables architectural innovations in register-file and cache design that were previously infeasible due to planar density constraints.
5. Energy Efficiency, Latency, and CIM Enabling Capabilities
Vertical monolithic integration leads to key system-level gains:
- Energy efficiency: Shortened interconnects decrease dynamic inter-tier energy by ~20%, supporting kernel-level efficiencies up to 436 GOPS/W for multiplication and 432 GOPS/W for addition in 32×32 CIM arrays. Standby power is further reduced by >70% in AOS-based eDRAMs compared to SRAM (Chakraborty et al., 15 Apr 2026, Waqar et al., 29 Jun 2025).
- Latency: One-to-one top-bottom coupling through MIVs halves R→W path lengths and reduces access times by ≥15%. The hybridized approach allows for high-speed DAC/ADC operations directly at the memory interface, facilitating general matrix computations (beyond dot-products).
- Bandwidth enhancement: Multi-ported BEOL gain-cell arrays in GPGPU register files triple the number of simultaneous accesses, supporting 512 Gb/s per bank at 9 ns and enabling scaling of warp sizes and SM counts without incurring timing overheads (Waqar et al., 29 Jun 2025).
- Compute-in-memory versatility: The 3D memory-on-memory framework enables in-memory transpose, element-wise addition, matrix multiplication, and custom operations at 4-bit precision, breaking traditional dot-product CIM constraints (Chakraborty et al., 15 Apr 2026).
6. Macro-Level Integration and System Impact
In high-performance systems, monolithic 3D SRAM-eDRAM structures impact overall architecture by enabling:
- Densely stacked register files with reduced leakage and increased porting, permitting either a reduction in memory banks (freeing die area) or an increase in register count per SM (supporting larger warps and higher GPU parallelism).
- Expansion of L2 or LLC caches (e.g., doubling or quadrupling on-chip capacity in the same footprint), leading to average 8% uplift in geometric mean IPC and up to 5.2× improvement in performance-per-watt in synthetic and benchmarked workloads (Rodinia, PolyBench, DeepBench) (Waqar et al., 29 Jun 2025).
- Refresh overhead on eDRAM in multi-tier configurations remains below 1% of overall cache stalls, demonstrating practical viability with minimal impact on miss rates (Waqar et al., 29 Jun 2025).
- Fine-grained bank partitioning allows concurrent access without IR-drop limitations prevalent in large, monolithic arrays.
A comparison summary shows AOS BEOL eDRAM achieves higher density (up to 6.1× versus ~4.5× for standalone FEOL eDRAM), similar access speeds via parallelism, and considerably lower static power than conventional SRAM (Waqar et al., 29 Jun 2025).
7. Challenges, Constraints, and Outlook
Thermal, process, and reliability considerations govern the scalability of monolithic 3D SRAM-eDRAM:
- Total stack height is limited by cumulative vertical thermal resistance; thermal conductivities vary substantially between tiers (e.g., 0 W/m·K for SRAM, 1 W/m·K for eDRAM).
- Per-tier yield remains high (98–99%), with multi-tier yield 2 indicating minor decrement for stacks up to 4 tiers (Waqar et al., 29 Jun 2025).
- Process variations (e.g., Vth spread in AOS) require robust margining in peripheral design but do not pose a dominant yield limiter.
- IR-drop and sneak-paths necessitate the partitioning of arrays into smaller mats (≤128 rows), capping macro size to avoid voltage droop ≥ 200 mV.
- Integration is limited to materials and processing steps compatible with sub-400 °C BEOL fabrication, excluding high-temperature anneal steps that might yield higher mobility but would compromise underlying tier reliability.
Current research demonstrates that combining fine-pitch MIV technology, low-temperature memory fabrication, and careful architectural partitioning enables a manufacturable scaling path for memory-dense, power-efficient embedded CIM, overcoming the classical limitations of planar SRAM scaling (Chakraborty et al., 15 Apr 2026, Waqar et al., 29 Jun 2025).