MIS SRAM: In-Situ Compute Memory
- Memory-In-Situ (MIS) SRAM is a class of SRAM that integrates computation directly into the memory array to execute operations like MAC with minimal external data movement.
- It comprises both digital (DCIM) and analog (ACIM) implementations, each exploiting unique circuit techniques to optimize energy efficiency and throughput for applications such as DNN inference and neuromorphic filtering.
- Innovative bitcell designs, peripheral circuitry, and hybrid techniques address design trade-offs in area, precision, and integration, enhancing overall performance in advanced computing tasks.
Memory-In-Situ (MIS) SRAM encompasses a class of static random-access memory (SRAM) macros that integrate both storage and computation directly within the memory array, specifically performing multiply-accumulate (MAC), Boolean logic, or other vector operations in-place, with minimal to no external data movement. By tightly coupling computation to weights and activations stored in SRAM cells, MIS SRAM architectures address the Von Neumann bottleneck, delivering energy efficiency and high throughput for workloads such as DNN inference, logic acceleration, neuromorphic filtering, and even cryptographic algorithms. MIS concepts span digital, analog, and hybrid domains, leveraging advances in bitcell design, peripheral circuits, and device technology to serve application-specific energy and accuracy needs.
1. Principles and Architectural Taxonomy
MIS SRAM comprises both Digital Compute-In-Memory (DCIM) and Analog Compute-In-Memory (ACIM) implementations. In DCIM, each SRAM bit cell (typically 6T, 8T, or 9T) performs digital bitwise operations—often XNOR, AND, or NAND—between stored data and dynamic operands (inputs or masks) delivered via bitlines or wordlines. The accumulation (e.g., summation in dot-product) is executed by on-chip digital adder trees or compressor networks.
ACIM variants exploit the analog properties of bitcell currents, charge, or timing. In current-domain ACIM, the product weight×input is mapped into a cell-controlled current summed on a shared bitline; in charge-domain, charge packets are injected or shared among periodic capacitors; in time-domain, computation is encoded in the propagation delays of current-starved inverters or ring-oscillators. The result is then digitized by local analog-to-digital converters (ADCs)—often SAR or time-domain quantizers—located at the array edge or embedded within the cell periphery (Yoshioka et al., 9 Nov 2024).
Hybrid approaches allocate the most significant bits (MSBs) of a computation to digital adders and the least significant bits (LSBs) to analog accumulators, trading accuracy for area/power savings; MSB–digital/LSB–analog (D/A) splitting is particularly effective for medium-precision tasks (Konno et al., 25 Aug 2025).
MIS architectures range from unmodified 6T arrays supplemented by peripheral charge-sharing logic (Ali et al., 2020), to custom 8T/9T bitcells with additional compute branches (Lokhande et al., 16 Nov 2025), to non-volatile extensions using magneto-electric FETs for normally-off operation (Najafi et al., 2023).
2. In-Situ Dot Product and Multiply-Accumulate Mechanisms
Digital MAC in SRAM (DCIM)
In typical DCIM, the dot-product operation
is realized by word-parallel activation of rows storing weights and broadcasting inputs via bitlines/wordlines. Bitwise multiply (e.g., XNOR for BNN) is achieved via local logic (XNOR/AND gates), and the resulting vector is accumulated by digital adder trees or compressor networks at the periphery. RX9T cells in FERMI-ML implement a 4T XNOR compute path on a shared match-line, enabling 1–64b MAC/CAM functionality with C22T compressor accumulation for logarithmic reduction in MAC latency (Lokhande et al., 16 Nov 2025). In-SRAM adder trees support scaling to wide SIMD (64 rows × 64 columns) and mixed-precision (FP-4, Posit-4) inference.
Analog In-Situ MAC (ACIM)
In ACIM, dot products are computed by mapping and into device-controlled analog parameters. In the IMAC 6T array, WL voltage encodes an input operand; the stored 4b weight modulates the discharge intervals of bitline segments, each proportional to its bit significance ( weighting). After staggered discharges, bitline analog voltages are merged by charge-sharing, yielding a scalar that is sampled and digitized (Ali et al., 2020).
Multi-bit analog MAC using charge-sharing and weighted capacitors is exemplified by PICO-RAM and hybrid analog-digital capacitive macros. PICO-RAM clusters standard 6T cells with a "MAC cell" containing a MOM capacitor. Bit-parallel computation is achieved by selecting which wordlines/caps participate (effecting multiplication), summing charge along local lines (accumulation), and digitizing the final analog voltage via a time-domain ADC, all using the original thin-cell 6T layout (Chen et al., 3 Jul 2024).
3. Bitcell and Array Innovations
Bitcell-Level Compute Augmentation
- 8T/9T Cells: Decouple read/write paths, add dedicated compute branches (e.g., XNOR/AND units), reduce read-disturb, allow analog current summation (Jaiswal et al., 2018, Lokhande et al., 16 Nov 2025).
- Special-Purpose Logic: In LiM, additional nMOS/pMOS transistors (dynamic or static logic) are included in each cell; area overhead is 1.33–1.83 versus baseline 6T, with the best in-memory EDP gain from static logic (Ottati et al., 2023).
- MAC Cells: Repurpose top metal for MOM capacitors while retaining 6T transistor footprint, enabling in-situ charge operations with minimal area growth (Chen et al., 3 Jul 2024).
- MEFET Integration: Add non-volatile MEFET latches to enable normally-off SRAM operation and sub-nanosecond backup/restore cycles, combining instantaneous in-situ logic with power gating (Najafi et al., 2023).
Array Topologies and Peripheral Design
- Resonant Loops: Embed inductor–bitline energy recovery to reduce dynamic write energy (recycling efficiency –$0.95$) (Challagundla et al., 14 Nov 2024).
- Multi-Bank/Parallel Topologies: Partition arrays for multi-macro (e.g., 3-macro, 6-macro) operation, reducing latency by leveraging parallel Boolean logic execution per bank (Challagundla et al., 14 Nov 2024).
- Embedded/Shared ADCs: Utilize bitline or sum-line capacitance as the DAC for in-situ SAR-ADC implementations, reducing ADC area by >90% compared to explicit MIM-cap arrays (Nasrin et al., 2021, Wang et al., 2023).
- Capacitive Trees: CAAT (capacitor-accumulation adder tree) architectures connect columns and banks with switched capacitive trees, supporting high-throughput vector MAC with only a single ADC per macro (Yin et al., 2022).
4. Performance Metrics and Application Cases
Key metrics:
| Macro | Precision | Throughput | Energy Efficiency | Area Density | Error/Accuracy |
|---|---|---|---|---|---|
| FERMI-ML (9T) | 1–64b | 1.93 TOPS @65 nm | 364 TOPS/W | 4.58 TOPS/mm² | QoR ≥ 97.5% on ResNet-18 |
| PICO-RAM | 4b | 3.17 TOPS @1b/1b | 40 TOPS/W | 0.7 TOPS/mm² | σ_err < 0.6 LSB |
| Hybrid 28 nm CIM | 8b cMAC | ~5 Gcomplex-MAC/s | 35 TOPS/W (real MAC) | 1.80 Mb/mm² | 0.435% RMS error (untrimmed) |
| Charge-domain CD-CiM | 8b/8b | 51.2 GOPS | 10.3 TOPS/W | — | 88.6% CIFAR-10 (post-calib.) |
| IMAC 6T (Std. analog) | 4b×4b | ~1.5 ns/MAC | 6.24x lower E vs vN | — | 0.1–1.3% accuracy loss |
Applications include TinyML (variable-precision MACs with compressor-tree accumulation (Lokhande et al., 16 Nov 2025)), complex-valued DSP (hybrid D/A CIM with 2D-weighted capacitor arrays (Konno et al., 25 Aug 2025)), AES encryption (multi-row subarray fusion for in-memory S-box/MixColumns (Zhang et al., 2022)), and event-based neuromorphic filtering (6T with bitline fusion for consensus denoising, >2000× energy gain (Bose et al., 2020)).
5. Nonideality, PVT Tolerance, and Hybridization
Analog-domain MIS designs are susceptible to process, voltage, and temperature (PVT) variations:
- Charge-domain designs (PICO-RAM, 28nm hybrid) use metal fringe capacitors with intrinsic matching and charge-redistribution for bit-parallel MAC; PVT variation is mitigated by autocalibrated ADCs and layout symmetry (Chen et al., 3 Jul 2024, Konno et al., 25 Aug 2025).
- Time-domain MACs employ matched delay elements with TDC-based quantization, robust to device mismatch (Wang et al., 2023).
- Monte-Carlo analyses in rCiM and IMAC confirm sufficient bitline discharge/analog sense margin across 10% VDD and 125°C (Challagundla et al., 14 Nov 2024, Ali et al., 2020).
Hybrid DCIM/ACIM macros split high-importance computation to digital paths and low-order bits or less salient activations to energy-efficient analog paths. This approach yields energy savings (up to 60%) while limiting error to sub-1% on standard CNN and Transformer models (Konno et al., 25 Aug 2025, Yoshioka et al., 9 Nov 2024).
6. Limitations, Trade-offs, and Integration Challenges
Design trade-offs in MIS SRAM include:
- Area vs. Flexibility: Addition of compute logic (static/dynamic CMOS) increases cell area by 30–80% (Ottati et al., 2023), but newer MAC cell designs reuse 6T diffusion and metal for minimal overheads (Chen et al., 3 Jul 2024).
- Read/Write Overhead: LiM static-logic variants can degrade read/write energy-delay by 120–443% versus 6T, but invert the balance and achieve >50% EDP gain for in-memory logic (Ottati et al., 2023).
- ADC/Peripheral Dominance: In high-precision analog macros, ADC and accumulator circuits can consume 30–40% of array area, necessitating sharing or batching (Ali et al., 2020).
- Precision/Energy Scaling: Compressor trees and bit-parallel analog MACs deliver logarithmic scaling in energy/latency per additional precision bit, contrasting linearly scaling ripple-carry adders (Lokhande et al., 16 Nov 2025).
- Integration with Cache Hierarchy: In-SRAM computational macros must coexist with traditional cache traffic. Techniques such as arbitration, command tagging, and array-level time-multiplexing are used to prevent performance loss (Zhang et al., 2022).
- Normally-Off Operation: MEFET-based MIS SRAM allows arrays to be power-gated in idle mode, with near-zero retention energy and sub-nanosecond wake-up (Najafi et al., 2023).
7. Future Directions and Design Guidelines
Progress in MIS SRAM emphasizes:
- Resonant Energy Recycling: Embedding inductive loops in write paths for substantial dynamic energy savings (up to 60%) (Challagundla et al., 14 Nov 2024).
- Automated Co-Design: Synthesis flows that optimize Boolean mapping, array partitioning, and operation scheduling for application-specific FoMs, especially in logic-oriented rCiM deployments (Challagundla et al., 14 Nov 2024).
- Scalability to Advanced Nodes: DCIM and charge-domain ACIM show favorable scaling in energy and throughput down to 3 nm (DCIM: 32 TOPS/W @12b, ACIM: up to 4094 TOPS/W @7nm for medium precision) (Yoshioka et al., 9 Nov 2024).
- Adaptation to Workload Saliency: Saliency-aware dynamic hybridization changes data routing to DCIM/ACIM paths on-the-fly, optimizing power for context-sensitive tasks (Yoshioka et al., 9 Nov 2024).
- Expanded Arithmetic and Data Types: Support for complex arithmetic, FP/Posit encoding, and approximate operators is increasing, enabling deployment in communications DSP, edge-AI, and security (Konno et al., 25 Aug 2025, Lokhande et al., 16 Nov 2025).
MIS SRAM macros, through careful selection of bitcell type, compute-periphery architecture, and hybridization strategies, are redefining the limits of on-chip inference and logical computation efficiency across AI, neuromorphic, security, and edge domains.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free