Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
95 tokens/sec
Gemini 2.5 Pro Premium
32 tokens/sec
GPT-5 Medium
18 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
468 tokens/sec
Kimi K2 via Groq Premium
202 tokens/sec
2000 character limit reached

Computation-in-Memory Platforms

Updated 3 August 2025
  • CiM platforms are hardware architectures that embed arithmetic and logic operations within memory cells, reducing energy consumption and data transfer latency.
  • They utilize diverse device technologies like SRAM, FeFET, and ReRAM to execute parallel matrix computations and boost throughput in deep neural networks and scientific applications.
  • CiM systems deliver significant gains in energy efficiency and scalability while facing challenges such as ADC overhead, device variability, and complex programmability.

Compute-in-Memory (CiM) platforms refer to hardware architectures that perform computation directly where data resides, typically within or adjacent to memory cells. By embedding arithmetic and logic operations within memory arrays, CiM architectures dramatically reduce the high energy and latency costs associated with data movement in conventional von Neumann systems. This paradigm encompasses a diverse range of device technologies (SRAM, FeFETs, ReRAM, emerging 2D materials, etc.) and targets acceleration of data-intensive workloads such as deep neural networks, probabilistic inference, logic tasks, and scientific computing. Recent research demonstrates substantial improvements in energy efficiency, throughput, and scalability, while also elucidating design, programmability, and device-physics-related challenges.

1. Device and Circuit-Level Innovations

CiM platforms leverage device-level phenomena and advanced cell structures to enable in-situ arithmetic and logic computation, digital storage, and non-volatile operation:

  • SRAM-based CiM: Both digital (DCIM) and analog (ACIM) approaches are realized using 6T/8T SRAM arrays, often augmented with extra transistors or PN junctions. DCIM uses digital adders for precise logic, whereas ACIM exploits current-, time-, or charge-domain summation for area and power efficiency. Charge-based ACIM, for example, relies on Q=CV charge redistribution and is robust to PVT variations, supporting up to 12-bit precision (Yoshioka et al., 9 Nov 2024).
  • Nonvolatile Memory Devices: Ferroelectric FETs (FeFETs), FeDs, and ReRAM technologies natively support analog weight storage, multibit precision, and CMOS compatibility. The 2T-1FeFET and 2FeFET-1T designs implement feedback mechanisms for subthreshold, ultra-low-power MAC with strong temperature resilience over 0–85°C, maintaining high inference accuracy even under drift (Zhou et al., 2023, Zhou et al., 2 Jan 2025).
  • 2D and Quantum Phenomena: The QAHE-based CryoCiM platform utilizes quantized, topologically protected Hall resistance states for single-cycle in-memory logic and nonvolatile storage at 4K, enabling robustness to process variation and supporting both logic and memory access in a unified cryogenic array (Alam et al., 2021). STeP-CiM employs PeFETs that combine piezoelectric strain and ferroelectric polarization for fast, ternary MAC in DNNs, exploiting voltage polarity reversal for dual-mode operation (Thakuria et al., 2022).
  • Multifunctional Cells: FeD-based transistor-less arrays provide scalable, field-programmable TCAM for search, as well as analog and neural network computation, facilitating data-centric processing (Liu et al., 2022). Multi-level FeFET-based cells, as in UniCAIM, allow both signed analog computation and content-addressable search for KV cache management (Xu et al., 10 Apr 2025).

2. System-Level Architectures and Acceleration

At the architecture level, CiM platforms support energy-efficient, high-throughput acceleration through several mechanisms:

  • Parallel MVM Execution: Arrays execute matrix-vector multiplications (MVMs) by activating multiple word-lines/rows in parallel, with computed dot products accumulated via analog or digital summation circuits. Platforms such as SiTe CiM (Thakuria et al., 24 Aug 2024), STeP-CiM, and TReCiM exploit massive parallelism for low-precision (ternary/binary) and multibit DNN operations, resulting in speedup factors of 6–9x and substantial energy savings.
  • Unified CAM/CIM and Content Addressing: Architectures like UniCAIM integrate CAM-mode for O(1) dynamic token selection and charge- or current-domain CIM for flexible static/dynamic KV cache pruning and in-place attention computation, shown to reduce the area-energy-delay product by up to 831x over prior designs (Xu et al., 10 Apr 2025).
  • Hybrid Modes and Dynamic Resource Allocation: Dual-mode switchable arrays, enabled by advanced compilers such as CMSwitch, may operate as either compute engines or on-chip memory buffers on demand, dynamically balancing compute and memory resource allocation per workload segment for LLMs and DNNs (Zhao et al., 24 Feb 2025). This can achieve up to 1.31x average inference speedup over static approaches.
  • Edge and Probabilistic Inference: MC-CIM incorporates Bayesian Monte Carlo Dropout directly into memory arrays, supporting probabilistic inference and prediction confidence at the hardware level for edge robotics and real-time applications, with 43% energy savings over conventional methods (Shukla et al., 2021).
  • Large-Scale Spiking and Event-Based Systems: Digital macros such as FlexSpIM support arbitrary operand resolution and layer-wise weight/output stationarity, providing up to 2x bit-normalized energy efficiency and 90% energy reduction in large SNN workloads (Chauvaux et al., 30 Oct 2024).

3. Compilation and Programmability

Programmability and mapping DNN workloads efficiently to heterogeneous CiM hardware represent central research challenges and recent progress:

  • Universal Compiler Frameworks: CINM and CIM-MLC enable automated, hierarchical lowering of high-level network descriptions onto diverse CiM hardware, abstracting chip/core/crossbar/row-level granularity and supporting device-agnostic as well as device-aware optimizations, such as operator duplication, tiling, and pipelining (Khan et al., 2022, Qu et al., 23 Jan 2024).
  • Dual-mode-Aware Compilation: CMSwitch introduces a hardware abstraction layer to model dual-mode resource switching, and incorporates mixed-integer and dynamic programming passes to schedule layers and allocate memory/compute resources for DNNs efficiently (Zhao et al., 24 Feb 2025).
  • Meta-Operator Semantics: Advanced compiler flows use meta-operators (e.g., CM.switch) to express flexible, hardware-independent schedules, translating into platform-specific ISAs (Zhao et al., 24 Feb 2025, Qu et al., 23 Jan 2024).
  • Cross-Layer Scheduling: Algorithms such as CLSA-CIM decompose output feature maps into sets, propagate dependencies, and overlap computations across layers, maximizing processing element utilization (up to 17.9x improvement) and reducing latency in tiled CIM architectures (Pelke et al., 15 Jan 2024).

4. Performance Metrics and Benchmarking

Recent CiM platforms demonstrate large gains across multiple metrics, often evaluated on realistic DNN and system-level benchmarks:

Platform/Device Peak Energy Efficiency Task/Workload Speedup Notable Accuracies
Eva-CiM (SRAM/FeFET) 1.3–6.0x/2.0–7.9x improv. System-level heterogeneous apps 1.0–1.5x (perf.) Depend. on workload
2T-1FeFET Subthreshold 2866 TOPS/W VGG/CIFAR-10 89.45% (CIFAR-10)
TReCiM Multibit 48.03 TOPS/W VGG-8/CIFAR-10 91.31% (CIFAR-10)
SiTe CiM (8T-SRAM, etc.) up to 88% lower latency, 78% E Ternary DNNs up to 7x throughput (Competitive, ternary DNN)
MC-CIM (SRAM) 27.8 pJ/30 MC-Dropout MNIST, VO tasks 43% energy saved Uncertainty correlation
Voxel-CiM 10.8 TOPS/W 3D point cloud CNNs 4.5–7x over SOTA
FlexSpIM 2x bit-norm. eff., up to 90% sys. SNN (IBM DVS Gesture) 95.8%

Speedup and efficiency depend strongly on workload structure, data and weight locality, memory hierarchy organization, and hardware-software co-design.

5. Data Movement, Energy Efficiency, and Scalability

A central advantage of CiM platforms is minimization of costly data transfers:

  • In-memory arithmetic eliminates shuttling of data between separated memory and logic cores, directly addressing the memory wall. For instance, the SiTe CiM architecture computes more than 90% of DNN MAC ops in-situ, yielding energy and latency reductions proportional to the number of parallel rows and columns (Thakuria et al., 24 Aug 2024).
  • Sparsity Exploitation: PACiM further reduces both cycle and data transfer counts by encoding bit-level sparsity, completely eliminating LSB activation reads/writes and simplifying computation to multiply-divide operations, achieving 81% cycle reduction and 50% fewer memory accesses while maintaining less than 1% accuracy drop (Zhang et al., 29 Aug 2024).
  • Weight Mapping and Tiling: Novel weight mapping strategies (e.g., sub-matrix partitioning in Voxel-CIM) and array-level workload balancing ensure full utilization of available memory bandwidth and computation parallelism, critical for maximizing throughput and realizing theoretical speedups.

6. Challenges, Open Problems, and Future Directions

Despite their promise, CiM platforms confront a set of open challenges:

  • Device Limits and Variability: Subthreshold and multibit operation in FeFETs/analog arrays must contend with temperature drift, process variation, and nonideal retention. Feedback/clamp designs and careful array-level error management are necessary to maintain accuracy at scale (Zhou et al., 2023, Zhou et al., 2 Jan 2025).
  • ADC/DAC Overheads: Analog CiM macros rely on quantization, and ADC energy and area can dominate at high resolutions unless efficiently pipelined or hybridized with digital computation (Yoshioka et al., 9 Nov 2024).
  • Programmability: The diversity of CiM hardware complicates compiler support and performance portability. Ongoing work focuses on universal abstractions, hardware-aware partitioning and scheduling, and dynamic run-time adaptation (Khan et al., 2022, Qu et al., 23 Jan 2024, Zhao et al., 24 Feb 2025).
  • Workload Sensitivity and Scalability: The actual proportion of offloadable operations, sensitivity to array size, and memory hierarchy choices must be empirically characterized. Design-space exploration frameworks like Eva-CiM are critical for system-level optimization (Gao et al., 2019).
  • Algorithm-Hardware Co-Design: Models like CIM-NET, designed for optimal mapping onto CiM architectures, highlight the need for hardware-algorithm co-design to fully exploit in-memory compute, especially for tasks with high DNN locality or non-standard data flows (Gao et al., 23 May 2025).
  • Integration with Emerging Applications: Cryogenic CiM (CryoCiM) and ferroelectric/CAM-unified arrays (UniCAIM) suggest application domains such as quantum/classical hybrid systems or efficient LLM inference that were previously infeasible with conventional technologies (Alam et al., 2021, Xu et al., 10 Apr 2025).

7. Applications and Impact

CiM platforms find relevance across a spectrum of application domains:

  • Deep learning inference and training for edge, mobile, and data center workloads, particularly where energy and throughput are bottlenecks.
  • Probabilistic, Bayesian, and spiking models for uncertainty quantification (e.g., MC-CIM) and high-speed, low-latency event-based recognition (FlexSpIM).
  • Scientific, logic, and search-based computing, including symbolic and neuro-symbolic AI (e.g., FeFET-CAM for neuro-symbolic search and MAC (Yin et al., 20 Oct 2024)), genomic pipelines, and security primitives.
  • Long-context NLP inference and transformer LLMs, where memory footprint and effective KV cache management are major system-level constraints (UniCAIM).

These advancements establish CiM as a leading candidate to overcome the fundamental bottlenecks of conventional architectures, promising scalable, energy-proportional, and workload-adaptive computation platforms for next-generation AI and data-centric systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)