Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
81 tokens/sec
Gemini 2.5 Pro Premium
47 tokens/sec
GPT-5 Medium
22 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
88 tokens/sec
DeepSeek R1 via Azure Premium
79 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
192 tokens/sec
2000 character limit reached

Working SRAM (WRAM): Fast, Flexible Memory

Updated 17 August 2025
  • Working SRAM (WRAM) is a high-speed scratchpad memory architecture using 6T cells and augmented variants for enhanced density and performance.
  • It supports synchronous and asynchronous read/write cycles for rapid local data access in compute-in-memory, edge, and embedded platforms.
  • Recent innovations integrate compute-in-memory and analog extensions, optimizing energy efficiency and latency for AI, embedded, and quantum control systems.

Working SRAM (WRAM) refers to high-speed, scratchpad, or general-purpose static random-access memory architectures designed for rapid local data access within a processing element or hardware system. WRAM forms the computational backbone in many-edge compute, in-memory processing, and embedded platforms by providing low-latency, reconfigurable storage for active workloads. Its implementations span conventional 6T cells, augmented bitcell layouts, compute-in-memory variants, and dedicated hardware blocks in processing-in-memory systems.

1. WRAM Fundamentals and Cell Architecture

Working SRAM is typically realized using cross-coupled CMOS inverter pairs with controlled access transistors, forming the canonical 6T (six-transistor) cell. The cell stores complementary data values at its two internal nodes (Q and Q̅) and connects to complementary bitlines via NMOS access transistors activated by the word line. The fundamental storage relation is given by: Q = NOT(Q̅), Q̅ = NOT(Q).

In the UPMEM processing-in-memory system, WRAM is realized as an on-chip scratchpad memory (typically 64 KB per Processing Unit (DPU)) residing close to the compute block and available for manual allocation of input blocks, kernel code, and intermediate results. WRAM access is direct and deterministic, in contrast to cache-based or DRAM storage that require asynchronous management and incur greater latency (Carrinho et al., 10 Aug 2025).

Contemporary designs extend the basic WRAM architecture through various enhancements:

  • Augmented cells (e.g., 8T, 7T) allow dynamic configuration between single-bit and multi-bit storage for increased data density (Sheshadri et al., 2021).
  • Compute-in-memory variants embed logic or analog compute inside SRAM arrays, leveraging the regular structure of working SRAM for bitwise, matrix-vector, or Boolean computation (Chen et al., 3 Jul 2024, Challagundla et al., 14 Nov 2024).

2. Memory Operation and Computational Models

WRAM supports synchronous or asynchronous read/write cycles coordinated via clock or enable signals. In synchronous embedded applications, the fundamental scheduling condition for write is: If WE = 1 and WCLK↑, Q ← Din, where WE is the write enable and WCLK↑ denotes the rising clock edge (Khatwal et al., 2014).

In UPMEM's WRAM, the operation consists of:

  • Partitioning input, output, and weight matrices into blocks that fit into the local WRAM.
  • Transferring blocks into WRAM from host or bulk DRAM (MRAM) storage.
  • Executing matrix multiplication or activation procedures in place, leveraging WRAM's low-latency cycles.

Work distribution models in PiM architectures using WRAM are formalized as: N₁ × N₂ = N (number of DPUs/block mapping), R(%) = \left( \frac{\dim(A) \times N_2 + \dim(B) \times N_1}{\dim(A) + \dim(B)} \right) × 100 (memory replication rate), T_{\mathrm{rows}} = \left\lceil \frac{(C/N_1)}{T} \right\rceil (rows per thread per DPU).

3. Performance, Power, and Stability Metrics

Performance metrics for WRAM are anchored on access time, dynamic power, stability, and delay:

  • Access time (t_ac): t_{ac} = t_{pd} + t_{setup}, where t_{pd} is the propagation delay, t_{setup} is setup time. Typical fast WRAM access times reported for synchronous SRAM designs range between 2–4 ns (Khatwal et al., 2014).
  • Dynamic power: P = ½ C V² f, with C being load capacitance, V supply voltage, f clock frequency. WRAM is engineered for low switching power due to proximity and direct addressing (Khatwal et al., 2014).
  • Write margin and delay: High-quality WRAM bitcells optimize write margin, e.g., 349.60 mV and fast write delay (54.4 ps) in CFET-based designs (Cheng et al., 9 Mar 2025).
  • Static Noise Margin (SNM): Characterizes the robustness of stored bits; SNM values measured at 3.09 mV in 6T CMOS implementations (London, 13 Aug 2025).
  • For compute-in-memory WRAMs, throughput and energy efficiency are critical, with top-performing analog designs achieving up to 40.2 TOPS/W and weight density of 559 Kb/mm² (Chen et al., 3 Jul 2024).

Table: WRAM Metrics (Selected Data)

Metric Value/Comments Reference
Access time (SRAM) 2–4 ns (Khatwal et al., 2014)
Write Margin (CFET SRAM) 349.60 mV (1B4T config) (Cheng et al., 9 Mar 2025)
Write Delay 54.4 ps (Cheng et al., 9 Mar 2025)
Kernel Exec. Time (MLP) <3 ms (UPMEM WRAM) (Carrinho et al., 10 Aug 2025)
SNM (6T SRAM) 3.09 mV (London, 13 Aug 2025)
Power Efficiency 40.2 TOPS/W (PICO-RAM analog CIM SRAM) (Chen et al., 3 Jul 2024)

4. Application Domains and System-Level Integration

WRAM provides rapid data access in several domains:

  • Processing-in-memory architectures (e.g., UPMEM): WRAM as the scratchpad for data-parallel tasks, notably neural network inference (MLPs), with experimental kernel execution times under 3 ms—competitive with low-power GPU solutions (Carrinho et al., 10 Aug 2025).
  • Edge and embedded systems: Synchronous WRAM-based SRAM forms the memory core for microcontrollers and FPGAs requiring deterministic low-latency scheduling (Khatwal et al., 2014).
  • Quantum control: SRAM-based real-time waveform generators utilize WRAM arrays to store control patterns for cryogenic qubit manipulation, demonstrating robust operation at 4 K (Prathapan et al., 2022).
  • High-density/low-power deep learning accelerators: Augmented memory cells increase local density for quantized weight storage, enabling advanced in-memory compute primitives (Sheshadri et al., 2021).

5. Enhanced WRAM: Augmentation, Compute, and Analog Extensions

Recent WRAM research has focused on dynamic augmentation and computing capability:

  • Augmented memory computing allows a cell to be configured for static or dynamic multi-bit storage (e.g., 8T dual-bit and 7T ternary designs). In augmented mode, the same cell increases its data density (two bits per 8T cell or log₂3 bits per 7T cell) at the cost of retention-managed refresh (Sheshadri et al., 2021).
  • Compute-in-memory architectures embedding analog/logic operations within WRAM allow direct execution of dot product, Boolean logic (NAND/NOR), and matrix-vector multiplication, thereby reducing data movement and enabling new performance/energy trade-offs. Bit-parallel techniques and in-situ charge-domain computing modules reuse WRAM’s local capacitors for MAC, DAC, and ADC functions (Chen et al., 3 Jul 2024, Challagundla et al., 14 Nov 2024).
  • Enhanced cell layouts, such as CFET-based 3-complementary-FET designs, vertically stack pass-gate transistors to reduce area by up to 37% and improve write margins (Cheng et al., 9 Mar 2025).

6. Trade-Offs, Limitations, and Prospects

WRAM designs manifest trade-offs among area, speed, retention, flexibility, and integration:

  • The dynamic cell modes (augmented storage, compute-in-memory) may demand periodic refresh (as in DRAM-like bits), which can impact throughput in real-time workloads unless algorithmic error mitigation is applied.
  • Highly integrated analog compute-in-memory macros achieve outstanding throughput and power efficiency but require attention to PVT variation, reference tuning, and capacitor non-idealities (Chen et al., 3 Jul 2024).
  • System-level bottlenecks (e.g., data transfer between host and WRAM in PiM architectures) can dominate execution time, suggesting optimization opportunities in future controller and interconnect design (Carrinho et al., 10 Aug 2025).
  • Write margin characterization by word-line voltage margin (WLVM) techniques provides non-intrusive, quantitative guidance for sizing and robustness, with strong correlation to conventional writability metrics (Alorda et al., 23 Nov 2024).

7. Future Directions and Contextual Significance

WRAM remains central to advances in memory-centric and compute-intensive architectures:

  • The move toward versatile augmented and compute-in-memory WRAM paves the way for densely integrated neural and edge inference engines.
  • Further integration with photonic or analog computing elements (FET-LET hybrids, charge-domain MACs, resonant energy recycling) is anticipated for order-of-magnitude improvements in speed, energy, and area.
  • System-wide architectural exploration—such as parallel multi-macro compute-in-memory optimization—yields up to 80.9% average energy reduction and tailors WRAM design to application-specific requirements (Challagundla et al., 14 Nov 2024).
  • Non-intrusive stability and margin characterization techniques will guide the scaling of future WRAM variants for reliability in nanometer CMOS nodes (Alorda et al., 23 Nov 2024).
  • WRAM is expected to remain a key enabler of next-generation systems, offering scalable, high-performance, and energy-efficient memory for edge, AI, in-memory computing, quantum control, and beyond.