RUDC Technique for Enhanced CIM Read Accuracy
- RUDC is a circuit innovation that enhances read-bitline dynamic range and current linearity in SRAM-based in-memory computing using an under-driven cascode configuration.
- It employs dual-9T bitcells with decoupled read/write operations, achieving up to a 700 mV swing and ±1% current variation for high-accuracy differential sensing.
- Integration into analog neural accelerators demonstrates improved inference accuracy, reduced error, and minimal latency/area overhead compared to conventional designs.
The Read-Word-Line Underdrive Cascode (RUDC) technique is a circuit-level innovation designed to maximize read-bitline dynamic range and current linearity in analog computing-in-memory (CIM) macros, specifically as implemented in dual 9T bitcells for signed input and ternary weight operations. By leveraging an under-driven read word line in the cascode configuration, RUDC achieves a substantial increase in signal swing and read accuracy without incurring significant area, latency, or energy penalties. The technique is positioned for integration in high-efficiency mixed-signal neural accelerators utilizing SRAM-based CIM architectures (Yang et al., 6 Dec 2025).
1. RUDC Structure and Circuit Schematic
RUDC is implemented within a dual-9T bitcell read path, supporting fully decoupled read and write operations. Each read path comprises two series-connected CMOS transistors—MH (cascode) above ML (driver)—with the precharged read bitline (RBL) at V. MH’s gate is driven by the read word-line (RWL) at an under-driven voltage ( V), while ML’s gate receives a steady bias ( V). Beneath ML, two additional latch transistors form the cell’s memory element. For each column, two parallel RUDC chains support differential sensing ().
2. Operating Principle and Biasing
During operation, RBL is initially precharged to . When a read is initiated, RWL rises to V—lower than the typical $1.0$ V—activating MH partially and allowing RBL to discharge incrementally through MH and ML. As drops, shrinks until reaches , at which point MH cuts off and clamps the minimum RBL voltage at approximately . ML remains in saturation as long as . This configuration yields a large, linear discharge swing—without forcing ML into triode or invoking body diode conduction in MH.
3. Key Equations and Analytical Metrics
Critical operation is codified in the following expressions:
- Clamp Voltage:
- Dynamic Range (DR):
- Gate-Source Voltages: ;
- Output Resistance (cascode):
- Unit Discharge Current:
4. Performance Comparison and Measurement Summary
RUDC offers a substantially higher bitline dynamic range and current linearity relative to both single-transistor and conventional cascode schemes. Over its $700$ mV swing, RUDC exhibits current variation, a region larger than the $250$ mV dynamic range of single-FET designs and higher than that of conventional cacsode configurations (with MH at fixed V). Monte Carlo simulations reveal RUDC’s current-vs-voltage slope is flatter than that of the single-FET case.
| Read Path | DR for ±1% ΔI_u | Relative DR | Bitline Margin (mV/col) |
|---|---|---|---|
| Single-FET | 0.25 V | 1.0× | 1.95 |
| Conventional | 0.51 V | 2.0× | 4.00 |
| RUDC | 0.70 V | 2.8× | 3.68 |
RUDC incurs no additional read latency and only marginal area/energy overhead—adding a single FET (MH) per read path and a buffer for , whose area is of the local driver chain. The net read energy remains , within of the conventional cascode.
5. Device Sizing, Fabrication, and Bias Strategy
The technique is implemented in $65$ nm CMOS using regular- devices. ML and MH are sized at and respectively, establishing and minimizing capacitive loading. Both threshold voltages are $0.30$ V at room temperature. All transistor bodies are tied to ground, and body effect is negligible since V. Bias voltages are set at V, V (from a single-buffered core rail), and V (regulated by a PTAT-compensated LDO). No replica-bias is required at the cascode gate; temperature and RF robustness result inherently from the use of the same bias domain as the MAC array.
6. Advantages, Limitations, and Integration Guidelines
RUDC supports the largest achievable signal swing ( mV) for optimal SNR into per-column sense amplifiers, with exceptionally linear discharge current over this range. This yields tighter bit-to-bit matching and lower column-to-column gain error. No additional read latency and only minimal buffer overhead are incurred. Limitations include a reduction in peak versus full-VDD drive (necessitating longer read if equal swing is required) and the need for a second supply for and a PTAT-LDO for temperature stability. Design guidelines include setting , keeping ML in saturation, minimizing MH width to reduce RBL capacitance, and reusing existing VBIAS/LDO infrastructure for multi-bit arrays.
7. Applicability and Broader Significance
RUDC can be ported to any SRAM-based in-memory compute cell supporting decoupled readout. Its adoption enables the computation of nonlinear activations with higher accuracy and energy efficiency in analog LSTM accelerators. As demonstrated, the RUDC-based CIM macro achieves on-chip inference accuracy for a $12$-class keyword-spotting task and contributes higher system-level normalized energy efficiency and improvement in area efficiency relative to prior works, while executing of LSTM linear and of nonlinear operations in the analog domain (Yang et al., 6 Dec 2025). The architectural simplicity and low penalty of the RUDC method suggest broad compatibility with advanced analog neural network accelerators, especially where maximizing dynamic range and minimizing error are critical.