Papers
Topics
Authors
Recent
2000 character limit reached

Memristor-Based In-Memory Computing

Updated 20 December 2025
  • Memristor-based in-memory computing is a paradigm integrating storage and processing through devices with multi-level conductance to bypass the von Neumann bottleneck.
  • It enables constant-time matrix–vector multiplications and deep learning operations with high throughput and significant energy savings.
  • System designs employ phase-change memory, crossbar arrays, and FPGA-based emulators to model non-idealities such as drift and noise for robust hardware-algorithm co-design.

Memristor-based in-memory computing refers to the direct execution of analog and digital computations within arrays of memristive devices, bypassing the traditional separation of memory and logic units. By exploiting the native physical properties of memristors—such as multi-level conductance, non-volatility, and analog programmability—in crossbar architectures, a broad range of algorithmic primitives, including matrix–vector multiplication (MVM), Boolean logic, and deep learning operations, can be mapped in hardware with massive parallelism and minimal data movement. This paradigm offers constant-time multiply–accumulate (MAC) throughput, orders-of-magnitude energy savings, and breaking of the von Neumann bottleneck for neural and AI workloads (Petropoulos et al., 2020).

1. Device Physics and Crossbar Storage Principles

Memristors, notably phase-change memory (PCM) cells, are engineered such that their conductance GG encodes information. PCM consists of a chalcogenide material (e.g., Ge2_2Sb2_2Te5_5) between two electrodes. Programming is achieved through electrical pulses, which set the material in either a crystalline (low-resistance) or amorphous (high-resistance) state, offering finely tunable intermediate conductance levels (Petropoulos et al., 2020). Readout conforms to Ohm's law:

I=GVI = G V

By iteratively pulsing and verifying, a target conductance GtargetG_{\mathrm{target}} within device-variability-imposed tolerance can be set. This supports analog weight mapping and fine-grained neural parameter encoding.

2. Vector–Matrix Multiplication and Crossbar Computation

Memristor crossbar arrays are architected as NN wordlines intersecting NN bitlines, with each intersection holding a memristor (GijG_{ij}). Applying a voltage vector V=[V1,,VN]TV = [V_1,\dots,V_N]^\mathsf{T} to the rows yields, via Kirchhoff’s laws, an instantaneous current readout on each column:

Ij=i=1NGijVi    I=GVI_j = \sum_{i=1}^N G_{ij} V_i \implies \mathbf{I} = G \mathbf{V}

This hardware operation directly implements an NN-length dot-product in constant time, irrespective of NN.

Crucially, the passive summation mechanism allows a full matrix–vector product—inference mode for neural networks or other compute tasks—to be performed with a single voltage application event. The input-output mapping is direct, only limited by peripheral ADC/DAC conversion rates in non-ideal systems.

3. Non-Idealities, Drift, and Accurate Emulator Modeling

Real-world memristive devices experience:

  • Conductance drift: PCM conductance G(t)G(t) after programming decays via a power law, G(t)=G(t0)(tt0)νG(t)=G(t_0)\left(\frac{t}{t_0}\right)^{-\nu}, with experimentally measured drift exponents νˉ0.06\bar{\nu}\approx0.06, σν0.02\sigma_\nu\approx0.02 in 90 nm PCM.
  • Read noise: 1/f noise with PSD SI(f)=Iread2Q/fS_{I}(f) = I^2_{\mathrm{read}} Q/{f}, Q[4×104,1.1×103]Q\in[4\times10^{-4},1.1\times10^{-3}].
  • Device variability and programming noise: endows each GijG_{ij} and drift exponent ν\nu with per-device randomness.

An FPGA-based emulator was constructed to reliably model these effects by maintaining individual G(t0)G(t_0), ν\nu pairs per cell in DRAM, applying experimental statistical distributions. It simulates drift and noise by precomputing noise time-series for each device, which are injected into analog MAC computations at inference. This permits emulation of \sim400,000 PCM devices for full-system evaluations (Petropoulos et al., 2020).

4. End-to-End Deep Learning Inference and Performance

To benchmark inference on neural networks, a canonical 784–250–10-layer network was trained offline, weights mapped to physical conductances (wG+Gw\propto G^+-G^- using differential pairs, exercising zero conductance for negative weights). The emulator can emulate and predict accuracy decay due to drift and noise with high fidelity:

  • Throughput: 8.8k images/s (227 µs latency) per 1034×520 crossbar.
  • Accuracy: Fresh weights yield 97.8% MNIST accuracy; after 27 hours, this degrades to 96.5% (emulator predicts 96.4±0.2%), maintaining <0.1% absolute error between emulator and hardware.

Temporal evolution of weight distributions, accuracy drop under drift/noise, and crossbar sizing effects are all quantitatively captured by the emulator, forming a robust foundation for hardware-algorithm co-design.

5. Scaling, Refresh, and Algorithmic Robustness

  • Tiling: Multiple crossbars can be pipelined for larger networks, keeping dataflow uniform.
  • Refresh strategies: Periodic reprogramming intervals to maintain target inference accuracy are quantitatively prescribed by emulator predictions.
  • Pareto optimization: Emulator sweeps over conductance levels and operating voltages allow identification of optimal operating points for trade-off between speed, energy, and accuracy.
  • In-the-loop training: Feedback from emulator to training procedures enables the design of drift-tolerant weight distributions.

Experimental validation showed single-cell drift curves, read noise PSDs, and inference accuracy decay all match within tight error margins.

6. Design Insights, System-Level Applications, and Outlook

The emulator architecture—Kintex UltraScale FPGA, DRAM-backed per-cell state maps, and activation/ADC pipeline—substantiates rapid, accurate hardware prototyping, sidestepping time-consuming chip-level measurements. All core layers for neural network inference, including activation functions, voltage droop management, and current-to-digital conversion, are emulated with hardware-level fidelity.

This methodology unlocks accelerated exploration of in-memory computing for future deep learning, facilitating hardware-algorithm co-design, quantification of real-world non-idealities (drift, noise, device variability), and supports scalable system deployment. By accurately capturing failure modes, refresh intervals, weight mapping reliability, and optimal operating parameters, research and development of next-generation memristive architectures is expedited (Petropoulos et al., 2020).

7. Summary of Quantitative Validation

Metric Value/Accuracy Emulator vs. Hardware Agreement
Single-cell drift mean error ≤2% over 9s ±1σ (Fig. 3c,d)
Read-noise PSD agreement ≤5% across 10 Hz–50 kHz
Crossbar inference accuracy decay <0.1% absolute error (over 27h, ~400k cells) Emulator matches chip
Throughput 8.8k images/s per crossbar 227 µs latency
Weight distribution (histograms) Overlap within ±0.5 µS (multiple time-points) Fig. 4b

Hardware emulation of memristive crossbars with full non-ideality modeling is essential for the rapid and reliable progression of in-memory computing for deep learning, offering validated system-level insights into accuracy, drift tolerance, and optimal hardware-algorithm operating points.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Memristor-Based In-Memory Computing.