RUDC Technique for Enhanced CIM Read Accuracy

Updated 13 December 2025

RUDC is a circuit innovation that enhances read-bitline dynamic range and current linearity in SRAM-based in-memory computing using an under-driven cascode configuration.
It employs dual-9T bitcells with decoupled read/write operations, achieving up to a 700 mV swing and ±1% current variation for high-accuracy differential sensing.
Integration into analog neural accelerators demonstrates improved inference accuracy, reduced error, and minimal latency/area overhead compared to conventional designs.

The Read-Word-Line Underdrive Cascode (RUDC) technique is a circuit-level innovation designed to maximize read-bitline dynamic range and current linearity in analog computing-in-memory (CIM) macros, specifically as implemented in dual 9T bitcells for signed input and ternary weight operations. By leveraging an under-driven read word line in the cascode configuration, RUDC achieves a substantial increase in signal swing and read accuracy without incurring significant area, latency, or energy penalties. The technique is positioned for integration in high-efficiency mixed-signal neural accelerators utilizing SRAM-based CIM architectures (Yang et al., 6 Dec 2025).

1. RUDC Structure and Circuit Schematic

RUDC is implemented within a dual-9T bitcell read path, supporting fully decoupled read and write operations. Each read path comprises two series-connected CMOS transistors—MH (cascode) above ML (driver)—with the precharged read bitline (RBL) at $V_{DD}=1.0$ V. MH’s gate is driven by the read word-line (RWL) at an under-driven voltage ( $V_{RWL}\approx0.8$ V), while ML’s gate receives a steady bias ( $V_{BIAS}\equiv V_{DD,\text{core}}\approx0.45$ V). Beneath ML, two additional latch transistors form the cell’s memory element. For each column, two parallel RUDC chains support differential sensing ( $\Delta V_{RBL} = V_{RBL,+,\text{read}} - V_{RBL,-,\text{read}}$ ).

2. Operating Principle and Biasing

During operation, RBL is initially precharged to $V_{DD}$ . When a read is initiated, RWL rises to $V_{RWL}=0.8$ V—lower than the typical $1.0$ V—activating MH partially and allowing RBL to discharge incrementally through MH and ML. As $V_{RBL}$ drops, $V_{GS,\text{MH}}=V_{RWL}-V_{RBL}$ shrinks until $V_{RBL}$ reaches $V_{RWL}-V_{T,\text{MH}}$ , at which point MH cuts off and clamps the minimum RBL voltage at approximately $V_{RWL}-V_{T,\text{MH}}$ . ML remains in saturation as long as $V_{RBL}\geq V_{BIAS}-V_{T,\text{ML}}$ . This configuration yields a large, linear discharge swing—without forcing ML into triode or invoking body diode conduction in MH.

3. Key Equations and Analytical Metrics

Critical operation is codified in the following expressions:

Clamp Voltage: $V_{RBL,\min}=V_{RWL}-V_{T,\text{MH}}$
Dynamic Range (DR): $DR_{RUDC}=V_{DD}-V_{RBL,\min}=V_{DD}-(V_{RWL}-V_{T,\text{MH}})$
Gate-Source Voltages: $V_{GS,\text{MH}}(V_{RBL})=V_{RWL}-V_{RBL}$ ; $V_{GS,\text{ML}}\approx V_{BIAS}-V_{RBL}$
Output Resistance (cascode): $r_{\text{out,cas}}\approx r_{o,\text{ML}}+r_{o,\text{MH}}+g_{m,\text{MH}}\,r_{o,\text{ML}}\,r_{o,\text{MH}}$
Unit Discharge Current: $I_u\approx\frac{1}{2}\mu C_{ox}(W/L)_{ML}(V_{GS,\text{ML}}-V_{T,\text{ML}})^2(1+V_{RBL}/V_A)$

4. Performance Comparison and Measurement Summary

RUDC offers a substantially higher bitline dynamic range and current linearity relative to both single-transistor and conventional cascode schemes. Over its $700$ mV swing, RUDC exhibits $\pm1\%$ current variation, a region $2.8\times$ larger than the $250$ mV dynamic range of single-FET designs and $1.4\times$ higher than that of conventional cacsode configurations (with MH at fixed $V_{BIAS}=0.45$ V). Monte Carlo simulations reveal RUDC’s current-vs-voltage slope is $7\times$ flatter than that of the single-FET case.

Read Path	DR for ±1% ΔI_u	Relative DR	Bitline Margin (mV/col)
Single-FET	0.25 V	1.0×	1.95
Conventional	0.51 V	2.0×	4.00
RUDC	0.70 V	2.8×	3.68

RUDC incurs no additional read latency and only marginal area/energy overhead—adding a single FET (MH) per read path and a buffer for $V_{RWL}$ , whose area is $\leq2\%$ of the local driver chain. The net read energy remains $\sim1\,\mathrm{pJ/cell}$ , within $5\%$ of the conventional cascode.

5. Device Sizing, Fabrication, and Bias Strategy

The technique is implemented in $65$ nm CMOS using regular- $V_T$ devices. ML and MH are sized at $1.2\,\mu\mathrm{m}/60\,\mathrm{nm}$ and $0.6\,\mu\mathrm{m}/60\,\mathrm{nm}$ respectively, establishing $I_u\approx1\,\mu\mathrm{A}$ and minimizing capacitive loading. Both threshold voltages are $0.30$ V at room temperature. All transistor bodies are tied to ground, and body effect is negligible since $V_{RBL,\min}\geq0.5$ V. Bias voltages are set at $V_{DD}=1.0$ V, $V_{RWL}=0.80$ V (from a single-buffered core rail), and $V_{BIAS}=0.45$ V (regulated by a PTAT-compensated LDO). No replica-bias is required at the cascode gate; temperature and RF robustness result inherently from the use of the same bias domain as the MAC array.

6. Advantages, Limitations, and Integration Guidelines

RUDC supports the largest achievable signal swing ( $\geq700$ mV) for optimal SNR into per-column sense amplifiers, with exceptionally linear discharge current over this range. This yields tighter bit-to-bit matching and $7\times$ lower column-to-column gain error. No additional read latency and only minimal buffer overhead are incurred. Limitations include a $\sim20\%$ reduction in peak $I_u$ versus full-VDD drive (necessitating $\sim1.25\times$ longer read if equal swing is required) and the need for a second supply for $V_{RWL}$ and a PTAT-LDO for temperature stability. Design guidelines include setting $V_{RWL}\approx V_{T,\text{MH}}+\text{(desired clamp voltage)}$ , keeping ML in saturation, minimizing MH width to reduce RBL capacitance, and reusing existing VBIAS/LDO infrastructure for multi-bit arrays.

7. Applicability and Broader Significance

RUDC can be ported to any SRAM-based in-memory compute cell supporting decoupled readout. Its adoption enables the computation of nonlinear activations with higher accuracy and energy efficiency in analog LSTM accelerators. As demonstrated, the RUDC-based CIM macro achieves $92.0\%$ on-chip inference accuracy for a $12$-class keyword-spotting task and contributes $2.2\times$ higher system-level normalized energy efficiency and $1.6\times$ improvement in area efficiency relative to prior works, while executing $99\%$ of LSTM linear and $80\%$ of nonlinear operations in the analog domain (Yang et al., 6 Dec 2025). The architectural simplicity and low penalty of the RUDC method suggest broad compatibility with advanced analog neural network accelerators, especially where maximizing dynamic range and minimizing error are critical.

Markdown Report Issue Upgrade to Chat

References (1)

A 33.6-136.2 TOPS/W Nonlinear Analog Computing-In-Memory Macro for Multi-bit LSTM Accelerator in 65 nm CMOS (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Read-Word-Line Underdrive Cascode (RUDC) Technique.