Logic-in-Memory Architectures
- Logic-in-memory architectures are integrated circuit designs that tightly couple logic and memory using advanced 3D stacking to minimize data movement and energy loss.
- They leverage monolithic 3D integration and ultra-dense vertical interconnects (MIVs) to achieve measurable reductions in latency, power consumption, and area across DRAM, SoCs, and AI accelerators.
- Emerging innovations like MIV-transistors and extended-gate MIV-FETs offer enhanced performance and efficiency, though challenges in thermal management, process variation, and routability remain.
Logic-in-Memory Architectures
Logic-in-memory (LiM) architectures represent a class of integrated circuit designs in which logic and memory functionalities are physically and electrically co-located or tightly coupled, minimizing the need for extensive data movement between distinct logic and memory blocks. This paradigm is fundamentally enabled by advances in three-dimensional integration technologies—most notably monolithic 3D integration—leveraging ultra-fine vertical interconnects to overcome the prohibitive wiring and energy bottlenecks endemic to traditional planar (2D) and through-silicon-via (TSV)-based systems.
1. Foundations of Monolithic 3D Integration and Vertical Interconnects
Monolithic 3D integration, often achieved using sequential layer-by-layer fabrication at low temperatures, allows device (logic and/or memory) layers to be stacked directly atop one another on the same wafer. Vertical electrical connectivity is realized by monolithic inter-layer vias (MIVs) with diameters in the tens of nanometers and densities exceeding 10⁸/cm², orders of magnitude above the capabilities of TSVs (5–10 µm diameter, ∼10⁴/cm²).
MIVs support tier-level partitioning with exceedingly low parasitics (<0.23 fF capacitance, ∼20 Ω resistance) and sub-micron pitch, enabling physical proximity of logic and memory blocks inaccessible to other integration schemes (Arka et al., 2020). This vertical stacking can be exploited for organizing DRAM with its peripheral logic in distinct stacked tiers to reduce bitline lengths, for hybrid CPU/GPU–LLC stacks providing fine-grained logic-memory proximity, or for tightly interleaved systolic dataflows in AI accelerators (Huang et al., 2020, Arka et al., 2020, Sedaghatgoo et al., 2024).
2. Logical and Memory Co-location: Approaches and Exemplars
Several architectural strategies have been proposed to unify logic and memory in monolithic 3D structures. A canonical example is the monolithic 3D DRAM described by Lin et al., which partitions DRAM arrays and peripheral logic into separate tiers. The DRAM cells, local bitlines, and wordlines are fabricated on the top tier, while sense amplifiers and decoders are placed immediately beneath, accessed by dense vertical MIVs. This arrangement reduces both local and global bitline lengths (LLBL, LGBL), yielding lower parasitics and breaking the traditional DRAM latency-area tradeoff prevalent in 2D planar architectures (Huang et al., 2020).
In heterogeneous manycore SoCs, HeM3D adopts gate-level partitioning: logic gates or functional clusters from a logic block (CPU, GPU, cache bank, network-on-chip router) are assigned across multiple tiers, with MIV-facilitated inter-tier connectivity. This supports an architectural fabric in which computation and embedded memory blocks—such as LLC tiles—are vertically and topologically proximate, dramatically reducing interconnect delays and hop counts. The small-world NoC topology in HeM3D further enhances this effect by embedding vertical shortcut links for multi-core cache access patterns (Arka et al., 2020).
AI accelerator designs, such as ARMAN, employ monolithic stacking of logic (processing elements) and on-chip SRAM scratchpads so that each PE is one vertical hop away from its dedicated local memory, supporting flexible scale-up or scale-out systolic operations and dataflow reconfiguration (Sedaghatgoo et al., 2024).
3. Device-Level Integration: MIV-Transistors, Extended-Gate Pillar FETs, and Back-End-of-Line Approaches
To further unify logic and memory at the device level, several works have demonstrated direct MIV-based logic and memory element realization within monolithic stacks.
- MIV-Transistors: In FDSOI-based monolithic 3D ICs, MIVs are not only passive interconnects but are repurposed as the core of vertical MIV-transistors (metal-insulator-semiconductor structures) (Vemuri et al., 2023). The area typically reserved for MIV keep-out zones (KOZ) is used for building vertical channel FETs with one, two, or four conduction channels around the MIV, yielding up to 18% area reduction, 3% delay reduction, and 1% power decrease for standard cell libraries compared with conventional 2D FDSOI stacking.
- Extended-Gate MIV-FETs: An enhanced MIV-transistor design uses lateral gate extensions over the substrate region abutting the MIV, restoring channel control and sharply suppressing leakage (14,000× improvement over previous MIV-FETs). This permits the KOZ to be fully utilized without exacerbating static power or variability, and enables cell-level area savings of ∼24%, ON-current improvement of 58%, and inverter delay/energy benefits exceeding 11% (Vemuri et al., 2023).
- BEOL AOS SRAM and Pass-Gate Stacks: In FPGA architectures, logic configuration memories and pass-gates are relocated to stackable back-end-of-line (BEOL) layers using amorphous oxide semiconductor (AOS) transistors. W-doped In₂O₃ (n-type) and SnO (p-type) BEOL FETs implement low-leakage SRAM cells and pass-gate multiplexers, reducing the Si logic area devoted to reconfiguration, decreasing critical-path delay (–27%), static power (–26%), and shrinking the AT² metric by 3.4× over 2D CMOS FPGAs (Waqar et al., 12 Jan 2025).
4. Key Performance Metrics and System-Level Outcomes
LiM architectures, when instantiated using monolithic 3D integration, exhibit marked improvements across canonical power, performance, area (PPA), and system-level metrics:
- In DRAM, monolithic stacking with vertical SAs and shortened bitlines provides up to 9.56% lower access latency, 4.96% power reduction, 21.21% energy-delay product (EDP) reduction, and 14% die area savings over 2D DDR4 DRAM, as shown by full-system PARSEC workloads (Huang et al., 2020).
- Manycore SoCs with integrated logic-memory stacks (HeM3D) achieve up to 18.3% execution time reduction and operate up to 19 °C cooler than TSV-based 3D designs due to diminished lateral interconnect lengths and improved vertical heat spreading from thinner interlayer dielectrics (Arka et al., 2020).
- FPGA fabrics employing BEOL AOS memories and pass-gates layered directly above CLBs similarly attain 59% tile footprint reduction, 26% routing static-power saving, and >3× improvement in the area-time² figure of merit (Waqar et al., 12 Jan 2025).
- At the circuit level, extended-gate MIV-FETs used in logic and memory yield up to 11.6% delay reduction, 17.9% slew improvement, and 4.5% dynamic power reduction over earlier MIV-FETs for inverter testbenches (Vemuri et al., 2023).
5. Integration Challenges: Thermal Budget, Process Variation, and Routability
LiM architecture realization must confront the physical and process constraints intrinsic to monolithic stacking:
- Thermal Budget and Device Degradation: Sequential tier stacking forces low-temperature FEOL/BEOL processing for the top logic/memory layers to avoid damaging underlying copper interconnects, impacting carrier mobility and drive current (top-tier transistor on-current reduction: 16–28%, FO4 delay increased up to 36%) and requiring the use of high-resistance tungsten in bottom-tier BEOL (Musavvir et al., 2019, Vemuri et al., 2023).
- Inter-Tier Process Variation: Systematic differences in electrical properties—and consequent EDP penalties—arise between stacked tiers. Models and multi-objective optimization are needed to partition router stages, interconnects, or SRAM blocks to mitigate the degraded performance (process-oblivious M3D NoCs underestimate EDP by 50.8%; process-aware routing restores 27.4% of lost efficiency) (Musavvir et al., 2019).
- Routability and Congestion: A naive 2D routing mindset in monolithic ICs can undermine vertical stacking gains due to pin access congestion at lower metal layers. True 3D routing fabrics—such as the Skybridge nanowire array—resolve this by providing multi-layer pin access, reducing per-area routing demand by up to 1.6× versus vanilla transistor-level M3D (T-MI) and ensuring zero congestion at all scales (Shi et al., 2016).
- Electrostatic Keep-Out Zones: MIV-induced fringe fields can dramatically amplify neighboring device leakage (up to 68,668× for closely spaced MIVs under low channel doping and tall substrates) (Vemuri et al., 2023). Process-specific KOZs (50–500 nm) are required to preserve device reliability and static power budgets.
6. Frontiers: Growth-Based Integration, Dual-Sided Architectures, and Scaling Pathways
Recent research has extended LiM and monolithic 3D architectures along new fronts:
- Seamless Growth of Single-Crystal Devices: Growth-based monolithic 3D integration, exemplified by direct, low-temperature (≤385 °C) CVD of single-crystalline WSe₂/MoS₂ transistor stacks on amorphous BEOL interlayers, enables vertical CMOS logic arrays with high mobility, I_on/I_off >10⁶, and sub-100 ps inverter delays—critical for integrating logic-memory layers at deep sub-micron nodes (Kim et al., 2023).
- Dual-Sided Monolithic Integration: Flip 3D (F3D) technology incorporates dual-sided M3D with DSI 2.0 interconnects, 3D transistor stacking, and multi-flip process flows to perform M3D on both wafer faces. This improves routability, yields up to 6.8% area and 5.9% EDP reduction for logic blocks, and supports more flexible die stacking and I/O strategies than conventional single-sided M3D (Wu et al., 2024).
- Non-Von Neumann Architectures and In-Memory Computing: While not captured directly in the reviewed sources, these monolithic LiM strategies pave the way for true in-memory computing paradigms by enabling physical and topological proximity, low-latency vertical data paths, and fine-grained logic-memory co-design across the stack.
7. Trade-Offs, Limitations, and Future Directions
Logic-in-memory architectures via monolithic 3D integration face several bottlenecks and open issues:
- Process Maturity and Tooling: Most monolithic 3D integration flows remain at the research or pre-production stage. Yield, defectivity, alignment, and cost remain open production challenges, particularly as multi-flip and backside patterning complexity increases (Wu et al., 2024).
- Thermal and Reliability Constraints: As logic-memory tiers stack vertically, thermal extraction and inter-tier stress become first-order concerns, necessitating novel thermal management and reliability analysis frameworks (Arka et al., 2020, Musavvir et al., 2019).
- Scalability: Extension to >2 tiers, mixed-technology stacks (e.g., non-volatile layers within logic arrays), and aggressive scaling below 20 nm all require innovations in low-temperature processing, alignment, and inter-layer dielectric engineering (Huang et al., 2020, Wu et al., 2024).
- Design Automation and Placement: Device- and standard-cell-aware placement/routing, KOZ-aware floorplanning, and 3D-aware EDA toolchains are required to fully realize the PPA and routability benefits (and to avoid unanticipated congestion or performance degradation) (Shi et al., 2016, Vemuri et al., 2023).
A plausible implication is that, as integration, reliability, and tooling constraints are better controlled, logic-in-memory using monolithic stacking will underpin dense, low-latency, and energy-efficient IC platforms spanning DRAM, heterogeneous SoCs, reconfigurable logic, and AI accelerators.
References
- (Huang et al., 2020)
- (Arka et al., 2020)
- (Vemuri et al., 2023)
- (Vemuri et al., 2023)
- (Shi et al., 2016)
- (Sedaghatgoo et al., 2024)
- (Waqar et al., 12 Jan 2025)
- (Kim et al., 2023)
- (Wu et al., 2024)
- (Musavvir et al., 2019)
- (Vemuri et al., 2023)