Monolithic 3D (M3D) Integration

Updated 5 April 2026

Monolithic 3D (M3D) integration is an advanced IC technology that sequentially fabricates multiple device tiers on a single substrate using nanoscale monolithic inter-tier vias.
It employs BEOL-compatible low-temperature processes to achieve fine-pitch vertical connectivity, significantly improving power, density, and routing efficiency over TSV-based stacking.
M3D enables innovative circuit partitioning and heterogeneous system integration, reducing net lengths and congestion while enhancing digital, memory, and accelerator designs.

Monolithic 3D (M3D) integration refers to an advanced integrated circuit (IC) technology in which multiple tiers of active devices—often down to the transistor or even standard cell level—are sequentially fabricated and interconnected on a single semiconductor substrate, without the need for wafer-to-wafer (W2W) bonding. This is achieved using nanoscale monolithic inter-tier vias (MIVs), which enable high-density, fine-pitch vertical connectivity and facilitate gate- or block-level circuit partitioning. M3D is distinguished from conventional 3D integration methods such as Through-Silicon-Via (TSV) stacking by dramatically smaller via dimensions, higher vertical interconnect density, and compatibility with tier-by-tier CMOS process flows. M3D architectures are being adopted across a range of domains, from digital logic and memory to heterogeneous manycore systems and reconfigurable fabrics, enabling enhanced density, reduced interconnect length, improved power-performance-area (PPA) scaling, and new routability and process integration paradigms.

1. Fundamental Principles and Process of Monolithic 3D Integration

Monolithic 3D (M3D) integration is predicated on the sequential fabrication of multiple transistor/circuit layers, where each active tier is constructed on top of the previous one using BEOL-compatible, low-temperature (< 400–500 °C) processes. The core vertical interconnection mechanism is the monolithic inter-layer via (MIV), a sub-100 nm diameter, nanometer-pitch metal via extending through thin interlayer dielectrics (ILDs) to connect devices or wires across tiers. MIVs are integrated during sequential process steps involving planarization, selective etch, metallization (commonly copper or tungsten), and liner deposition to ensure electrical and thermal integrity (Shi et al., 2016, Vemuri et al., 2023, Vemuri et al., 2023).

The process flow for an M3D-integrated IC generally includes:

FEOL formation of bottom active devices (transistors or standard cells), this may use FDSOI, FinFET, or emerging channel materials.
ILD deposition and planarization to isolate and flatten the initial layer.
Via etch and metallization through the ILD to create MIVs.
Sequential deposition or epitaxial regrowth of a new silicon device layer atop the ILD (thickness O(20–150 nm)).
Fabrication of second-tier active devices and routing, referencing the established MIVs for connectivity.
Repetition of ILD/MIV formation and device layer stacking for additional tiers as needed. Thermal budgets must be carefully managed, with upper-tier FEOL typically constrained to ~400–500 °C, to prevent degradation of underlying devices and BEOL metals (especially Cu) (Kim et al., 2023, Huang et al., 2020).

TSV-based stacking, by contrast, uses die- or wafer-scale bonding with via diameters of 2–10 μm, limiting density and affecting TSV placement granularity and die area utilization. Wafer alignment is to the micrometer level in TSV, but to the nanometer level in M3D.

2. Circuit Partitioning, Routability, and Cell Design Considerations

M3D enables both block-level and transistor/cell-level partitioning. At the transistor level (“T-MI”), CMOS standard cells are split across tiers, connected by dense MIVs, and further routed in a largely 2D-style horizontal metal stack per tier (Shi et al., 2016). While this reduces some critical net lengths and increases cell density (2× over 2D CMOS; up to ~22% power reduction), the restricted pin access (typically to a single routing plane per cell per tier) produces high congestion at lower metal layers—especially as logic cell density increases. Routability is characterized by the local demand-to-resource ratio $\rho_{ij}$ :

$\rho_{ij} = \frac{D_{ij}}{R_{ij}}$

where $D_{ij}$ is the routing demand (estimated track usages per region $ij$ ), and $R_{ij}$ is the available routing resource. Severe congestion ( $\rho_{ij} \geq 1$ ) is frequently observed at the lower metal layers in large-scale T-MI designs.

In contrast, fine-grained 3D fabrics—such as Skybridge—distribute pin access across multiple vertical and horizontal planes and embed 3D routing primitives directly into the cell library and P&R flow, achieving up to 1.6× lower routing demand and alleviating congestion entirely ( $\rho_{ij} < 1$ at all layers), with up to 3× lower power and 11× higher density versus 2D CMOS (Shi et al., 2016).

Tables quantifying typical comparative metrics:

Architecture	Power ( $\downarrow$ )	Density ( $\uparrow$ )	Routing Congestion
2D CMOS	—	—	High (at scale)
T-MI (M3D)	up to 22%	up to 2×	Severe (M1–M3)
Skybridge 3D	up to 3×	up to 11×	None ( $\rho < 1$ )

3. Device Innovations: MIV-Transistor Architectures and BEOL Integration

Monolithic 3D places stringent area and KOZ (Keep-Out-Zone) requirements on via and active device co-design (Vemuri et al., 2023, Vemuri et al., 2023, Vemuri et al., 2023). The MIV can serve not only as a via but also as an active device terminal (gate contact), yielding MIV-transistor architectures where one or more transistor channels are patterned around the via. For example, embedding 1-, 2-, or 4-channel MOSFETs around an MIV enables area overhead reduction up to 18% (standard cell level) with 1% power and 3% speedup over planar FDSOI, provided process complexity (gate patterning, ultra-thin oxide) is managed (Vemuri et al., 2023).

The extended-gate MIV device realizes an even greater reduction in leakage—by 1.4×10⁴ over naive MIV-FETs—while increasing $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 0 by 58%; inverter delay, slew, and power are simultaneously reduced by 11.6%, 17.9%, and 4.5%. This exploits the full KOZ area (e.g., 46 nm radius at 1fF-level capacitances) without incurring MIV-induced leakage penalties (Vemuri et al., 2023).

In advanced M3D FPGAs, BEOL-integrated amorphous oxide semiconductors (AOS)—notably W:In $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 1O $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 2 NMOS and SnO PMOS—allow for stackable, ultra-low-leakage SRAM and pass-gate cells vertically distributed above 7nm FEOL FinFET logic. Such architectures attain 3.4× improved area–time², 27% lower critical path delay, and 26% routing power reduction versus LUT-based 2D FPGAs (Waqar et al., 12 Jan 2025).

4. System-Level and Architectural Implications

When applied at subsystem or SoC scale, M3D enables partitioning of CPUs, GPUs, caches, NoC routers, and accelerator fabrics across tiers, with critical interconnects traversing dense MIVs. This architectural verticality yields:

Reduced net length (by $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 3 for $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 4 tiers), higher logic density, and lower power by minimizing buffer/repeater count.
In manycore systems (HeM3D), up to 18.3% lower application execution time and 19°C reduction in peak temperature compared to TSV-3D architectures, as nearly every logic stage and router can be partitioned at gate level across tiers with optimized placement (Arka et al., 2020).
Coarse-grained M3D in memory (e.g., DRAM) separates sense amplifiers and periphery from array blocks on different tiers, connected by MIVs. This shortens both local and global bitlines, breaks the area–latency tradeoff, and yields 9.56% reduction in latency, 4.96% reduction in power, 21.2% lower energy-delay-product, and 14% area savings on representative PARSEC workloads (e.g., M3D-128 versus 2D DDR4-512) (Huang et al., 2020).
M3D enables reconfigurable CNN accelerators (ARMAN) to partition PEs and SRAMs across four tiers, with MIVs providing macro-array-to-array connections. Flexible scale-up/scale-out datagraphs enable up to 2× cycle reduction, 2.24× power savings, and 4.55× EDP improvement over single-mode architectures (Sedaghatgoo et al., 2024).

System methodologies for effective M3D design increasingly require process variation–aware synthesis and placement tools. Inter-tier process variation (e.g., top-tier device slowdown due to low- $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 5 regrowth, bottom-tier interconnect resistance from W vs Cu) can offset naive EDP gains by 50–84% if not co-optimized. Dynamic partitioning of logic, routing, and memory blocks among tiers to minimize such penalties is essential (Musavvir et al., 2019).

5. Process Limitations, KOZ Engineering, and Scalability

Critical process-aware design constraints in M3D arise from the electrostatic coupling between MIV sidewall metals, ILD thickness, and adjacent substrate/active device regions. A pronounced exponential increase in leakage occurs for transistors placed too close (O(50 nm or less) at low doping) to biased MIVs due to MIS inversion in the local silicon—experimentally, as much as a $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 6 increase in off-leakage for $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 7, $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 8, $\rho_{ij} = \frac{D_{ij}}{R_{ij}}$ 9 (Vemuri et al., 2023). The required KOZ is therefore a strong function of substrate and S/D doping, ILD and MIV dimensions, and can vary from 50 to 500 nm. Placement, routing, and standard cell logic must account for these minima to avoid intolerable leakage and yield loss.

Future scalability will depend on further reducing process temperature budgets (enabling more than two tiers, or integration of BEOL-compatible materials), automating MIV-aware KOZ co-optimization in P&R tools, and introducing advanced vertical logic libraries (e.g., dual-sided M3D, DSI 2.0) to fully exploit both wafer surfaces and die-to-die hybrid bonding (Wu et al., 2024).

M3D also offers new single-crystal vertical device integration paths by direct low-temperature growth of TMDs (e.g., WSe $D_{ij}$ 0, MoS $D_{ij}$ 1) on amorphous/polycrystalline surfaces, achieving vertical n–p logic with $D_{ij}$ 2 exceeding $D_{ij}$ 3, and sub-400 °C budgets to preserve BEOL integrity (Kim et al., 2023).

6. Advanced M3D Architectures: Dual-Sided, Heterogeneous, and System-in-Package Integration

Cutting-edge research demonstrates the feasibility and benefits of dual-sided monolithic 3D (M3D) architectures, where both frontside (FS) and backside (BS) of a wafer support active device construction, routing, and hybrid die-to-die bonding. Flip 3D integration (F3D) and DSI 2.0 methodologies extend pin access and signal routing to both wafer faces, halving high-fanout netlengths and reducing area and EDP by up to 6.8% and 5.9% respectively (evaluated on 32-bit RISC-V FFET cores) (Wu et al., 2024). Multi-flip sequences (e.g., triple flips) permit all FEOL steps across both wafer surfaces to be completed before any BEOL deposition, enabling use of low- $D_{ij}$ 4 dielectrics and lower-resistivity metals (e.g., Ru) without thermal risk. This enhances circuit frequency (up to 2.3%) and lowers EDP compared to double-flip flows.

The realization of back-side power delivery networks (BSPDN) in conjunction with dual-sided signal routing introduces further scaling potential, and hybrid bonding of active stack modules is envisioned for heterogeneous integration (logic + DRAM + analog + photonics), as in wafer-scale system-in-package concepts.

7. Outlook, Open Challenges, and Future Directions

Monolithic 3D (M3D) integration continues to progress from laboratory prototype to advanced EDA flows, logic and memory products, and heterogeneous architectures. Key open problems and directions include:

Development of robust thermal management for multi-tier, high-power logic (microfluidics, advanced spreaders).
Further reduction of process temperature budgets and development of materials/processes to permit 3+ tier stacking above copper BEOL.
Industry adoption of MIV-aware KOZ, standard cell logic partitioning, and 3D/XYZ P&R algorithms.
Extension of direct low-temperature single-crystal growth for vertical integration of high-mobility logic/library devices and optoelectronics.
Integration of advanced reconfigurable accelerators (FPGA, CNN/ML) exploiting vertical density advantages, with co-optimization of dynamic routing and configuration plane architectures.

Monolithic 3D ICs provide a decisive departure from the constraints of planar CMOS and TSV stacking, achieving true vertical silicon scaling, substantial wirelength and congestion reduction, and integrated logic-memory-analog fabrics on a single substrate footprint (Shi et al., 2016, Huang et al., 2020, Wu et al., 2024, Vemuri et al., 2023, Vemuri et al., 2023, Waqar et al., 12 Jan 2025, Sedaghatgoo et al., 2024, Arka et al., 2020, Kim et al., 2023, Musavvir et al., 2019, Vemuri et al., 2023).