Papers
Topics
Authors
Recent
2000 character limit reached

RX9T Bit-Cell: In-Memory Compute SRAM

Updated 23 November 2025
  • RX9T Bit-Cell is a 9-transistor SRAM design that integrates conventional storage with dedicated compute transistors for parallel XOR/XNOR and secure toggling functions.
  • One implementation augments a 6T SRAM with 3 extra transistors to achieve full-array toggle and erase, while the other couples a 5T core with a 4T XNOR network for variable-precision compute and CAM operations.
  • The architecture leverages precise transistor sizing, enhanced bit-line/word-line schemes, and optimized layout techniques to balance area, power, and noise margins for efficient edge and TinyML applications.

The RX9T bit-cell is a class of 9-transistor SRAM bit-cell architectures optimized for in-memory compute operations, notably array-level XOR and XNOR functions, as well as secure data toggling, with demonstrated integration in energy-efficient macros for edge intelligence and TinyML. Two distinct RX9T implementations appear in current literature: one uses a 9-transistor arrangement to enable massively parallel XOR and secure erase/toggle mechanisms (Yin et al., 2023); another, within FERMI-ML, couples a 5T storage core to a 4T XNOR network for in-situ variable-precision compute and CAM operations (Lokhande et al., 16 Nov 2025). Both are characterized by augmented bit-line and word-line architectures, expanded transistor stacks for compute, and careful transistor sizing for area, power, and noise-margin optimization.

1. Circuit Architecture and Schematic Constituents

The RX9T topology universally builds on a conventional SRAM core by adding compute-dedicated transistors:

  • In the security/toggling-oriented RX9T (Yin et al., 2023), a standard 6T cross-coupled SRAM left (M1–M6) is augmented with three transistors (M7–M9) to implement array-level XOR and controlled data toggling. M1–M4 realize the storage latches, M5–M6 serve as access transistors, and M7–M9 introduce a dynamic node “N,” per-column operand data line “DL,” and a reset/XOR bit-line “BLR.”
  • In the FERMI-ML RX9T (Lokhande et al., 16 Nov 2025), a 5T storage cell (M1–M5) is extended by a 4T series NMOS XNOR compute network (M7–M10) connecting the Q/QB nodes of different rows to dedicated match-lines (ML/MLB). The architecture is designed for simultaneous read/write and in-situ compute using distinct word-line and bit-line schemes.

An exemplary schematic is as follows (using the conventions of both papers):

Core (Storage) Compute (Logic) Access Lines/Nodes
M1–M6: 6T or M1–M5: 5T M7–M9: XOR/XNOR (security or PIM) WL1/WL2, BL/BLB, DL/BLR, ML/MLB
Vx/Vy (or Q/QB): storage nodes N, ML, MLB: compute/control nodes

This arrangement preserves the core function of SRAM and overlays a logic enable, with all compute transistors sized and driven to maximize speed and noise margins.

2. Functional Decomposition of Transistor Roles

Each transistor’s functional partitioning is rigorous:

  • Storage: M1–M4 (or M1–M5) implement the robust cross-coupled inverter, retaining static SNM and stable read/write margins under 22 nm–65 nm nodes. M5 and M6 facilitate word-line gated access.
  • Compute/Toggle: M7–M9 (or M7–M10 in FERMI-ML) enable in-situ XOR/XNOR and secure flipping by forming series discharge paths between internal storage nodes and match-lines. Gates for compute transistors are driven either by dynamic operand lines (DL, BLR) or inter-row connect nets (ML, MLB).
  • Access/Control: WL1/WWL and WL2/RWL provide mode multiplexing, configuring the cell for standard SRAM, parallel compute, or erase.

This decomposition facilitates both memory and logic operation without interference.

3. Array-Level Compute Parallelism and Mode Sequencing

The RX9T architecture’s primary innovation is whole-array compute parallelism via controlled activation sequence:

  • In (Yin et al., 2023), a two-step operation enables single-cycle XOR: first, a “conditional reset” (assert WL1, apply negative BLR) sets storage nodes depending on operand B (DL); second, “conditional flip” (assert WL2, set BLR=DL) toggles bits only if required by sampled state, allowing every row to execute A⊕B in parallel.
  • In FERMI-ML (Lokhande et al., 16 Nov 2025), simultaneous activation of two rows and monitoring discharge on ML/MLB lines realizes XNOR for all row pairs per cycle. Discharge timing—distinct for match/mismatch—enables both binary compute and CAM operations. Timing phases are explicitly: precharge, evaluation, sense/hold, and restore.

These mechanisms remove long sequential logic trees external to memory, maximizing in-situ compute throughput.

4. Security Features: Data Toggling and Erase Modes

A defining feature in (Yin et al., 2023) is secure data toggling, defeating imprinting and remanence issues:

  • Global XOR with operand B=1 on all columns produces full-array toggle, randomizing cell values and mitigating process-induced ageing or predictable cold-boot recovery.
  • Erase mode—by performing only the first reset phase for all cells—clears data for remanence defense in a single cycle.

These capabilities arise directly from the conditional compute path and shared control lines, providing hardware-level security without extra primitives.

5. Performance Metrics: Power, Delay, Noise Margin, Area

Comparative benchmarking at 22 nm and 65 nm nodes is explicit:

  • Transistor count is 9T vs. 6T (1.5× area overhead per cell), but still ~2× more compact than prior security-augmented cells (e.g. 14T–22T).
  • Standby leakage: 5.89 µW/cell (Yin et al., 2023), 10 pA/cell (VDD=0.9 V) (Lokhande et al., 16 Nov 2025).
  • Dynamic power/read: 7.38 µW/cell (1.46× 6T); write: 7.49 µW (1.2×); XNOR compute: 35 fJ/bit per operation.
  • Delay: 0.5–1.2 ns for read/write, XNOR at 0.8–1.2 ns enables 350 MHz operation and 1.93 TOPS with FERMI-ML, at 364 TOPS/W energy efficiency and >97.5% QoR.
  • Margins: SNM is within 5% of 6T baseline; write/read margins nearly overlap; CAM margins in FERMI-ML are 180–200 mV at 0.9 V.

Equations for delay, margin, and energy (representative):

treadCBLΔVIon(M5/M6)t_{read} \approx \frac{C_{BL} \Delta V}{I_{on}(M5/M6)}

Ebit=CBLVDD2E_{bit} = C_{BL} V_{DD}^2

SNMVDD2VT\mathrm{SNM} \approx \frac{V_{DD}}{2} - V_T

6. Implementation Details and Sizing Rationales

In both works, fabrication uses GlobalFoundries 22 nm FDX (Yin et al., 2023) and 65 nm CMOS (Lokhande et al., 16 Nov 2025), with the following layout optimizations:

  • Pull-up (p-MOSFET) sizing is double-minimum for SNM, NMOS pull-downs are minimum-width, compute-path NMOS are multiples of minimum for robust discharge.
  • Area at 65 nm: RX9T bit-cell is 2.63 µm², sub-divided as 1.5 µm² for storage, 1.13 µm² for compute. Parasitic capacitance for match-line is ~20 fF, storage node ~3 fF.
  • Peripheral: dual word-line decoders, low-voltage drivers for BLR/DL, single-ended sense amps for ML/MLB, column muxes for mode-select.
  • Macros: 256×256 bit arrays fit into standard L1 cache banks, with FERMI-ML demonstrating 4 KB matrixes running mixed-precision TinyML workloads.

A plausible implication is that layout scaling and leakage can be further improved by process migration; the 9T topology is robust to further digital integration.

7. Application Domains, Trade-offs, and Comparative Analysis

RX9T bit-cells target edge inference, TinyML, and embedded security, supporting both binary neural network compute and cryptographically informed toggling. Compared to standard 6T and 8T PIM cells:

  • RX9T eliminates the need for external logic or charge-sharing ADCs for Boolean PIM.
  • Area overhead is modest, with functional gain in in-situ compute and security.
  • FERMI-ML's RX9T supports simultaneous variable-precision MAC and CAM; XOR-oriented RX9T achieves full-array toggle/erase.
  • Read/write decoupling and extra compute network improve both SNM and energy efficiency.

Common misconceptions, such as the belief that increased transistor count would substantially degrade SNM or speed, are not reflected in measured margins or access times. Trade-offs between area/power and compute/security are optimized: RX9T incurs a factor-of-1.5 area penalty but delivers novel functional capabilities and high net energy savings.

Both XOR- and XNOR-based RX9T bit-cells represent the current apex of in-memory compute-enabled SRAM for secure, high-throughput, low-power matrix operations in next-generation edge and AIoT applications (Yin et al., 2023, Lokhande et al., 16 Nov 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to RX9T Bit-Cell.