Papers
Topics
Authors
Recent
Search
2000 character limit reached

RUDC Technique for Enhanced CIM Read Accuracy

Updated 13 December 2025
  • RUDC is a circuit innovation that enhances read-bitline dynamic range and current linearity in SRAM-based in-memory computing using an under-driven cascode configuration.
  • It employs dual-9T bitcells with decoupled read/write operations, achieving up to a 700 mV swing and ±1% current variation for high-accuracy differential sensing.
  • Integration into analog neural accelerators demonstrates improved inference accuracy, reduced error, and minimal latency/area overhead compared to conventional designs.

The Read-Word-Line Underdrive Cascode (RUDC) technique is a circuit-level innovation designed to maximize read-bitline dynamic range and current linearity in analog computing-in-memory (CIM) macros, specifically as implemented in dual 9T bitcells for signed input and ternary weight operations. By leveraging an under-driven read word line in the cascode configuration, RUDC achieves a substantial increase in signal swing and read accuracy without incurring significant area, latency, or energy penalties. The technique is positioned for integration in high-efficiency mixed-signal neural accelerators utilizing SRAM-based CIM architectures (Yang et al., 6 Dec 2025).

1. RUDC Structure and Circuit Schematic

RUDC is implemented within a dual-9T bitcell read path, supporting fully decoupled read and write operations. Each read path comprises two series-connected CMOS transistors—MH (cascode) above ML (driver)—with the precharged read bitline (RBL) at VDD=1.0V_{DD}=1.0 V. MH’s gate is driven by the read word-line (RWL) at an under-driven voltage (VRWL0.8V_{RWL}\approx0.8 V), while ML’s gate receives a steady bias (VBIASVDD,core0.45V_{BIAS}\equiv V_{DD,\text{core}}\approx0.45 V). Beneath ML, two additional latch transistors form the cell’s memory element. For each column, two parallel RUDC chains support differential sensing (ΔVRBL=VRBL,+,readVRBL,,read\Delta V_{RBL} = V_{RBL,+,\text{read}} - V_{RBL,-,\text{read}}).

2. Operating Principle and Biasing

During operation, RBL is initially precharged to VDDV_{DD}. When a read is initiated, RWL rises to VRWL=0.8V_{RWL}=0.8 V—lower than the typical $1.0$ V—activating MH partially and allowing RBL to discharge incrementally through MH and ML. As VRBLV_{RBL} drops, VGS,MH=VRWLVRBLV_{GS,\text{MH}}=V_{RWL}-V_{RBL} shrinks until VRBLV_{RBL} reaches VRWLVT,MHV_{RWL}-V_{T,\text{MH}}, at which point MH cuts off and clamps the minimum RBL voltage at approximately VRWLVT,MHV_{RWL}-V_{T,\text{MH}}. ML remains in saturation as long as VRBLVBIASVT,MLV_{RBL}\geq V_{BIAS}-V_{T,\text{ML}}. This configuration yields a large, linear discharge swing—without forcing ML into triode or invoking body diode conduction in MH.

3. Key Equations and Analytical Metrics

Critical operation is codified in the following expressions:

  • Clamp Voltage: VRBL,min=VRWLVT,MHV_{RBL,\min}=V_{RWL}-V_{T,\text{MH}}
  • Dynamic Range (DR): DRRUDC=VDDVRBL,min=VDD(VRWLVT,MH)DR_{RUDC}=V_{DD}-V_{RBL,\min}=V_{DD}-(V_{RWL}-V_{T,\text{MH}})
  • Gate-Source Voltages: VGS,MH(VRBL)=VRWLVRBLV_{GS,\text{MH}}(V_{RBL})=V_{RWL}-V_{RBL}; VGS,MLVBIASVRBLV_{GS,\text{ML}}\approx V_{BIAS}-V_{RBL}
  • Output Resistance (cascode): rout,casro,ML+ro,MH+gm,MHro,MLro,MHr_{\text{out,cas}}\approx r_{o,\text{ML}}+r_{o,\text{MH}}+g_{m,\text{MH}}\,r_{o,\text{ML}}\,r_{o,\text{MH}}
  • Unit Discharge Current: Iu12μCox(W/L)ML(VGS,MLVT,ML)2(1+VRBL/VA)I_u\approx\frac{1}{2}\mu C_{ox}(W/L)_{ML}(V_{GS,\text{ML}}-V_{T,\text{ML}})^2(1+V_{RBL}/V_A)

4. Performance Comparison and Measurement Summary

RUDC offers a substantially higher bitline dynamic range and current linearity relative to both single-transistor and conventional cascode schemes. Over its $700$ mV swing, RUDC exhibits ±1%\pm1\% current variation, a region 2.8×2.8\times larger than the $250$ mV dynamic range of single-FET designs and 1.4×1.4\times higher than that of conventional cacsode configurations (with MH at fixed VBIAS=0.45V_{BIAS}=0.45 V). Monte Carlo simulations reveal RUDC’s current-vs-voltage slope is 7×7\times flatter than that of the single-FET case.

Read Path DR for ±1% ΔI_u Relative DR Bitline Margin (mV/col)
Single-FET 0.25 V 1.0× 1.95
Conventional 0.51 V 2.0× 4.00
RUDC 0.70 V 2.8× 3.68

RUDC incurs no additional read latency and only marginal area/energy overhead—adding a single FET (MH) per read path and a buffer for VRWLV_{RWL}, whose area is 2%\leq2\% of the local driver chain. The net read energy remains 1pJ/cell\sim1\,\mathrm{pJ/cell}, within 5%5\% of the conventional cascode.

5. Device Sizing, Fabrication, and Bias Strategy

The technique is implemented in $65$ nm CMOS using regular-VTV_T devices. ML and MH are sized at 1.2μm/60nm1.2\,\mu\mathrm{m}/60\,\mathrm{nm} and 0.6μm/60nm0.6\,\mu\mathrm{m}/60\,\mathrm{nm} respectively, establishing Iu1μAI_u\approx1\,\mu\mathrm{A} and minimizing capacitive loading. Both threshold voltages are $0.30$ V at room temperature. All transistor bodies are tied to ground, and body effect is negligible since VRBL,min0.5V_{RBL,\min}\geq0.5 V. Bias voltages are set at VDD=1.0V_{DD}=1.0 V, VRWL=0.80V_{RWL}=0.80 V (from a single-buffered core rail), and VBIAS=0.45V_{BIAS}=0.45 V (regulated by a PTAT-compensated LDO). No replica-bias is required at the cascode gate; temperature and RF robustness result inherently from the use of the same bias domain as the MAC array.

6. Advantages, Limitations, and Integration Guidelines

RUDC supports the largest achievable signal swing (700\geq700 mV) for optimal SNR into per-column sense amplifiers, with exceptionally linear discharge current over this range. This yields tighter bit-to-bit matching and 7×7\times lower column-to-column gain error. No additional read latency and only minimal buffer overhead are incurred. Limitations include a 20%\sim20\% reduction in peak IuI_u versus full-VDD drive (necessitating 1.25×\sim1.25\times longer read if equal swing is required) and the need for a second supply for VRWLV_{RWL} and a PTAT-LDO for temperature stability. Design guidelines include setting VRWLVT,MH+(desired clamp voltage)V_{RWL}\approx V_{T,\text{MH}}+\text{(desired clamp voltage)}, keeping ML in saturation, minimizing MH width to reduce RBL capacitance, and reusing existing VBIAS/LDO infrastructure for multi-bit arrays.

7. Applicability and Broader Significance

RUDC can be ported to any SRAM-based in-memory compute cell supporting decoupled readout. Its adoption enables the computation of nonlinear activations with higher accuracy and energy efficiency in analog LSTM accelerators. As demonstrated, the RUDC-based CIM macro achieves 92.0%92.0\% on-chip inference accuracy for a $12$-class keyword-spotting task and contributes 2.2×2.2\times higher system-level normalized energy efficiency and 1.6×1.6\times improvement in area efficiency relative to prior works, while executing 99%99\% of LSTM linear and 80%80\% of nonlinear operations in the analog domain (Yang et al., 6 Dec 2025). The architectural simplicity and low penalty of the RUDC method suggest broad compatibility with advanced analog neural network accelerators, especially where maximizing dynamic range and minimizing error are critical.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Read-Word-Line Underdrive Cascode (RUDC) Technique.