NM-TOS: Near-Memory Architecture for TOS Updates

Updated 9 December 2025

The paper introduces NM-TOS, a near-memory framework using an 8T SRAM design and pipelined row-level updates to achieve a 24.7× speedup in TOS updates for corner detection.
It employs dynamic voltage and frequency scaling with a three-counter sliding window for adaptive, energy-efficient processing of event-driven data.
Quantitative results demonstrate latency reduction from 392 ns to 16 ns and maintain robust detection accuracy with minimal BER impact under aggressive voltage scaling.

Near-Memory Architecture for Efficient TOS Updates (NM-TOS) is a hardware-centric framework devised for accelerating Threshold-Ordinal Surface (TOS) updates used in corner detection tasks for Event-based Cameras (EBCs). By integrating a read-write decoupled 8-transistor (8T) SRAM cell architecture, row-level pipelining, and dynamic voltage and frequency scaling (DVFS), NM-TOS delivers per-event threshold updates with substantially reduced latency and energy, while sustaining robust corner detection accuracy—even under aggressive voltage scaling. These properties make NM-TOS particularly suited for low-power, high-throughput applications on edge devices where rapid event-driven computation is essential (Shang et al., 2 Dec 2025).

1. System Pipeline and Architectural Overview

NM-TOS operates in a multi-stage processing pipeline that transforms raw event streams from an Address-Event Representation (AER) sensor into corner classifications. Events, represented as $v=(v_x,v_y,v_p,v_t)$ , are initially filtered by a Spatio-Temporal Correlation Filter (STCF) to attenuate isolated noise. In parallel, a lightweight controller executes a dynamic event-rate measurement using a three-counter sliding window, calculating the short-term event throughput $f_e$ . This rate parameter guides a lookup table (LUT) responsible for selecting the optimal supply voltage $V_{\text{dd}}$ and clock frequency $f_{\text{clk}}$ for downstream processing.

Subsequent to denoising and dynamic adaptation, the NM-TOS core executes an event-by-event (EBE) TOS update over a $P\times P$ spatial neighborhood ( $P=7$ by default). Once updated, the TOS surface is input to a frame-by-frame (FBF) Harris LUT, classifying corners at candidate event positions.

Pipeline Overview

Stage	Function	Output
STCF denoising	Suppress isolated noise in event stream	Filtered events
DVFS controller	Adapt $V_{\text{dd}}$ , $f_{\text{clk}}$	Speed/energy config
NM-TOS patch update	EBE TOS update over $P\times P$	Refreshed TOS
Harris LUT	Classify corners	Corner outputs

This dataflow enables real-time, adaptive corner detection with minimized latency and energy overhead (Shang et al., 2 Dec 2025).

2. 8T Read-Write-Decoupled SRAM Cell Structure

Fundamental to NM-TOS is the employment of a physically isolated 8T SRAM cell ("type A"), distinguishing the Read BitLine (RBL) and Write BitLine (WBL) via dedicated NMOS access transistors (M_A1/M_A2 for RBL; M_W1/M_W2 for WBL). The standard six elements configure back-to-back inverters to retain the storage state.

This architecture allows simultaneous read/write operations: reading from row $i$ while writing to row $i-1$ . Such decoupling eliminates critical path dependencies, supporting a pipelined update strategy rather than sequential per-row read-compute-write cycles.

Key cell features:

Dedicated access transistors per read/write vector.
Full retention using conventional cross-coupled inverter pairs.
Functional completeness and cell robustness validated via 65 nm CMOS SPICE simulations, with zero bit error rates at $V_{\text{dd}} \geq 0.62\,\text{V}$ and controlled performance degradation below this voltage (Shang et al., 2 Dec 2025).

3. Row-Level Pipelining and Timing Optimization

Patch updates in TOS require sequential manipulation of $P$ rows per input event. Traditional methods incur latency:

$L_{\text{serial}} = P \times (t_1 + t_2 + t_3 + t_4)$

with $t_1, t_2, t_3, t_4$ representing precharge, read/“minus-one”, compare, and write-back delays. By leveraging decoupled RBL/WBL circuitry, these four phases are organized in a classic four-stage pipeline:

$L_{\text{pipelined}} = P \times (t_1 + t_2) + t_3 + t_4$

This reduces row update costs to $(t_1 + t_2)$ after an initial prologue. For $P=7$ in a 65 nm implementation, pipelined operation achieves approximately $16\,\text{ns}$ latency at $V_{\text{dd}} = 1.2\,\text{V}$ , equating to a throughput of $63.1\,\text{Meps}$ —a 24.7× speedup over conventional serial digital implementations (~392 ns).

4. Hardware–Software Co-Optimization and DVFS Integration

The data-dependent nature of EBC throughput motivates real-time adaptation of power-performance envelopes. NM-TOS employs a three-counter sliding window (window $=10\,\text{ms}$ , stride $=50\%$ ) to derive $f_e$ and select configuration parameters via LUT. Dynamic energy per update adheres to:

$E_{\text{dynamic}} \propto C \cdot V_{\text{dd}}^2 \cdot \alpha$

where $C$ is net capacitance and $\alpha$ is switching activity. Thus, reducing $V_{\text{dd}}$ from $1.2\,\text{V}$ to $0.6\,\text{V}$ can, in principle, yield a 0.25× energy reduction. With clock adaptation, actual savings reach up to 6.6× in measured implementations.

5. Quantitative Performance and Robustness Characterization

Performance characterization of NM-TOS utilizes 65 nm CMOS SPICE benchmarks with $P=7$ patch size:

Latency: At $V_{\text{dd}} = 1.2\,\text{V}$ , serial digital patch update incurs $\sim$ 392 ns; NM-TOS pipelined yields $\sim$ 16 ns.
Throughput: Conventional digital methods achieve $2.6\,\text{Meps}$ ; NM-TOS pipelined supports $63.1\,\text{Meps}$ .
Energy: Per patch update energy is $166\,\text{pJ}$ ([email protected]), $139\,\text{pJ}$ ([email protected]), $26\,\text{pJ}$ ([email protected]).
Bit Error Rate (BER): Robust operation ( $0\%$ BER) at $V_{\text{dd}} \geq 0.62\,\text{V}$ ; BER rises to $0.2\%$ at $0.61\,\text{V}$ and $2.5\%$ at $0.6\,\text{V}$ .

Since only the three most significant stored bits (levels 8:5) are utilized, the impact of BER on practical corner detection outcome is marginal (Shang et al., 2 Dec 2025).

6. Corner Detection Accuracy and Application Impact

Corner detection performance was evaluated using precision-recall AUC on two Prophesee datasets ("shapes_dof" and "dynamic_dof"). Even under worst-case BER ( $2.5\%$ at $0.6\,\text{V}$ ), the degradation in detection is minor:

“shapes_dof”: $\Delta$ AUC ≈ 0.027
“dynamic_dof”: $\Delta$ AUC ≈ 0.015

This signifies that hardware-induced imperfections, even under aggressive DVFS, incur negligible real-world reduction in event-based corner detection quality.

7. Significance and Future Perspectives

NM-TOS establishes a template for integrating near-memory architectures, pipelined microarchitectures, and adaptive DVFS in resource-constraint environments demanding rapid response to event-based sensory data. The achieved combination—8T SRAM topology, pipelined patch updates, peripheral co-optimization, and flexible power scaling—demonstrates a practical pathway to bridge algorithmic advances in event-driven computer vision and hardware limitations of edge deployment. A plausible implication is the viability of NM-TOS strategies in broader contexts involving patch-based updates of ordinal surfaces or similar representations in real-time applications.

For comprehensive method details and implementation, see (Shang et al., 2 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Near-Memory Architecture for Threshold-Ordinal Surface-Based Corner Detection of Event Cameras (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Near-Memory Architecture for Efficient TOS Updates (NM-TOS).