Analog Compute-in-Memory (ACIM) Overview
- ACIM is a computing architecture that integrates analog operations within memory arrays to execute vector–matrix multiplications using physical laws like Ohm’s and Kirchhoff’s.
- It employs diverse memory technologies, including RRAM, PCM, and SRAM, to perform MAC operations in parallel while significantly reducing data movement.
- ACIM systems face challenges such as noise, device non-idealities, and ADC quantization, which are addressed through noise-aware training and hardware–software co-design.
Analog Compute-in-Memory (ACIM) encompasses solid-state computing architectures in which multiply–accumulate (MAC) operations are carried out directly within memory arrays by exploiting physical analog principles—primarily charge conservation and Kirchhoff’s laws—thereby collapsing data storage and computation into a single location. ACIM platforms leverage the intrinsic device physics of various non-volatile memories (NVMs; e.g., RRAM, PCM, FeFET) or SRAM, bypassing the need for digital arithmetic logic units and drastically reducing data movement. The principal advantage is orders-of-magnitude improvements in energy and area efficiency for vector–matrix multiply (VMM) workloads that dominate deep neural network (DNN) inference, albeit with unique challenges in noise resiliency, device non-idealities, and precision scaling (Wan et al., 2020, Yoshioka et al., 2024, Bowen et al., 2023).
1. Physical Principles and Circuit Topologies
ACIM arrays are constructed as crossbars whose core elements—resistive or capacitive memory cells—act as programmable analog weights. Each operation applies a vector of input voltages or pulse-width-modulated signals to wordlines; the resultant physical signals (currents, charges, or delays) are modulated by the cell properties, multiplied per cell, and summed via Kirchhoff’s laws along bitlines:
- Current-mode (Ohm’s law): In a resistive crossbar, the output bitline current , where encodes the weight (Bowen et al., 2023, Yoshioka et al., 2024).
- Charge-domain: SRAM or NVM cells accumulate charge proportional to ; charge redistribution at the array periphery yields instantaneous dot-products (Xuan et al., 2023, Yoshioka et al., 2024).
- Time-domain: Input–weight pairs modulate delays or pulsewidths; summing is performed across a timing chain, then digitized by time-to-digital converters (TDCs) (Yoshioka et al., 2024, Xuan et al., 2023).
A unifying feature is the analog summation “for free” provided by fundamental physical laws, allowing ACIM blocks of large spatial dimension () to perform parallel VMMs in a single cycle.
2. Error Mechanisms and Analog Non-Idealities
Sources of inaccuracy in ACIM are multifaceted:
- Device stochasticity: Write noise, cycle-to-cycle fluctuation, device-to-device mismatch, and retention drift alter the programmed conductances (e.g., ), manifest as additive weight noise () (Wan et al., 2020, Zhang et al., 2024).
- Circuit-level artifacts: Parasitic resistance and capacitance in interconnects, voltage drops, and limited analog swing degrade accumulation linearity and attenuate signals (Read et al., 5 May 2025, Yang et al., 29 May 2026).
- ADC/DAC quantization: Limited resolution and thermal, comparator, and mismatch-induced errors in ADCs become significant contributors to total system error, often dominating energy cost (Kavishwar et al., 13 Jul 2025, Yoshioka et al., 2024).
- Neural resilience: DNNs trained purely for digital execution (without explicit noise-injection) exhibit catastrophic accuracy collapse under typical device noise levels (); tailored training is mandatory for robust deployment (Wan et al., 2020, Zhou et al., 2021).
Quantitative error propagation can be described by: for an output , the variance is (Wan et al., 2020).
3. Co-Design for Accuracy: Training Algorithms and Representations
DNN–hardware co-design approaches are essential to mitigate ACIM stochasticity:
- Noise-injection and co-training: During training, independent noise samples () are applied in forward passes (Hessian-Aware Stochastic Gradient Descent, HA-SGD), selecting solutions with inherently flat minima (low Hessian norm) for resilience to device noise (Wan et al., 2020, Zhou et al., 2021).
- Analog floating-point mapping: Instead of direct conductance mapping, a two-factor representation separates a “base” analog charge 0 and a “scale” per-filter or per-layer discharge current 1, effectively realizing an analog mantissa–exponent structure. Dynamic range is maximized via 2; effective per-layer scaling is achieved by adjusting 3, decoupling the physical device window from inference precision (Wan et al., 2020).
- Quantization-aware training (QAT): Aggressive quantization (e.g., ternary/binary outputs per analog column) is handled by embedding the quantizer (with backprop’able surrogate derivatives) and scale factors as trainable parameters. This enables radical simplification of column periphery by eliminating full ADCs in favor of simple comparators, as in HCiM (Negi et al., 2024).
4. Array-Level and System Architectures
Modern ACIM arrays exhibit several key design patterns:
- Crossbar organization: Dense, large arrays (4, 5) amortize the energy/area of analog periphery, yielding 6 simultaneous MACs per cycle (Bowen et al., 2023, Xuan et al., 2023).
- In-memory analog-to-digital conversion (IMADC): ADCs designed for area minimization (e.g., ramp-based, time-domain, or thermometer-coded designs) are shared across columns, with multi-level charge-sharing architectures to further reduce area and power (Yang et al., 29 May 2026, Xuan et al., 2023).
- Hybrid memory stacks: Partitioning weights into MSBs stored in digital SRAM for linearity, and LSBs in multi-level NVM (e.g., ReRAM) maximizes density while constraining end-to-end error (Xuan et al., 2023).
- Heterogeneous integration: Combinations of digital and analog CIM macros handle varying precision and energy requirements; task scheduling shifts noise-tolerant layers to ACIM, and accuracy-critical layers (e.g., softmax, normalization) to digital logic (Yoshioka et al., 2024, Wang et al., 19 Nov 2025).
A representative macro, YOCO, demonstrates 123.8 TOPS/W at 26.2 TOPS throughput/core (8b MAC), area efficiency > 800 Mb/mm², and sub-0.8% VMM error via charge/time-domain compute and hybrid ReRAM/SRAM storage (Xuan et al., 2023).
5. Quantitative Performance, Energy, and Precision Trade-offs
ACIM’s chief figures of merit include energy efficiency (TOPS/W), area efficiency (TOPS/mm²), peak throughput, and effective precision:
- Scaling behavior: For large array dimensions (7), the effective energy per MAC scales as 8 due to amortization of ADC/DAC overhead (Bowen et al., 2023, Houshmand et al., 2023).
- Precision–energy trade-off: ADC energy grows as 9 (with bits 0), making low-precision, noise-resilient nets essential for practical energy gains (Kavishwar et al., 13 Jul 2025).
- ADC optimization: Minimum necessary ADC resolution is best set using a compute SNR (CSNR) metric measuring the fidelity of the analog dot-product for the signal distribution. The CACTUS algorithm yields up to 3 b reduction in required ADC bits versus SQNR-optimized approaches, delivering 6 dB CSNR improvement at matched error rate (Kavishwar et al., 13 Jul 2025).
- System-level results: WideResNet-28 on CIFAR-100 demonstrated a drop from 94.4% top-5 accuracy to 45.0% for naive digital-to-analog transfer under 6% device noise; HA-SGD restored top-5 accuracy to 88.5%, and co-design enabled deployment at only 1–3% absolute loss relative to digital baseline while preserving >100× energy efficiency (Wan et al., 2020). TinyML networks on AON-CiM (PCM) maintain <1.2% drop over 24 h drift, achieving up to 57.4 TOPS/W at 4 b (Zhou et al., 2021).
6. Architectural Variants and Emerging Directions
Recent architectural strategies include:
- Charge/time/voltage domain: Charge-based (e.g., YOCO, EasyACIM) and current-based (e.g., traditional resistive) dominate for linearity and system controllability. Time-domain (pulse-width, delay) approaches enable compact, high-throughput MACs and reduce the need for large SAR ADCs (Xuan et al., 2023, Yoshioka et al., 2024, Zhang et al., 2024).
- Nonlinear in-memory functions: Nonlinear activation functions implemented in analog—e.g., in-memory tanh and sigmoid via programmable ADC ramps or SOT-MRAM—offload additional postprocessing (Yang et al., 6 Dec 2025, Elbtity et al., 2021).
- Hybrid digital–analog partitioning: Selective digitalization (e.g., MSB cycles, critical submodules) yields robust designs. Ongoing research pursues mixed-precision, attention-specific, and majority-voting-derived hybrids for further efficiency improvements (Zhang et al., 2024, Negi et al., 2024).
- Automation and synthesis: Agile design space exploration and layout via synthesizable architectures and MOGA optimization (as in EasyACIM) enable practical deployment across diverse tech nodes and applications, covering array scaling, SNR, area, and energy (Zhang et al., 2024).
7. Outlook and Field-wide Implications
ACIM is poised to be a pivotal technology in AI acceleration wherever moderate accuracy tolerances are acceptable for large topologies. The field's chief open directions are:
- Statistically robust training: Integrating accurate models of device/circuit-level noise into training pipelines is mandatory for deploying large-scale DNNs (Wan et al., 2020, Zhou et al., 2021).
- Energy scaling with dynamic range: New methods such as local normalization (GR-MAC) decouple dynamic range from precision, maintaining energy benefits for low-bit floating-point and LLM workloads (Rojkov et al., 8 Feb 2026).
- Task and system mapping: Emerging frameworks (e.g., NeuroSim) enable per-layer assignment of quantization, array partitioning, and noise resilience, targeting optimal PPA–accuracy tradeoffs (Read et al., 5 May 2025).
- Integration of diverse devices: Charge- and resistive-based arrays, capacitive domain, FeFETs, PCM, and SOT-MRAM will continue to be benchmarked for application-specific trade-offs (density, retention, endurance, analog state linearity) (Yoshioka et al., 2024, Bowen et al., 2023, Liu et al., 2023).
- Benchmarking and transparency: Systematic benchmarking (e.g., ASiM, NeuroSim, ZigZag DSE) and open-source simulation tools are enabling rigorous, reproducible evaluation of full-stack ACIM designs on real DNNs (Zhang et al., 2024, Read et al., 5 May 2025, Houshmand et al., 2023).
ACIM is expected to deliver system-level energy efficiencies in the range of 100–4,000 TOPS/W, area per MAC around 0.01 μm², and moderate-precision (6–8 bits) accuracy on challenging tasks, provided noise-aware training and hardware–software co-design principles are followed (Yoshioka et al., 2024, Bowen et al., 2023, Xuan et al., 2023).