NM-TOS: Near-Memory Architecture for TOS Updates
- The paper introduces NM-TOS, a near-memory framework using an 8T SRAM design and pipelined row-level updates to achieve a 24.7× speedup in TOS updates for corner detection.
- It employs dynamic voltage and frequency scaling with a three-counter sliding window for adaptive, energy-efficient processing of event-driven data.
- Quantitative results demonstrate latency reduction from 392 ns to 16 ns and maintain robust detection accuracy with minimal BER impact under aggressive voltage scaling.
Near-Memory Architecture for Efficient TOS Updates (NM-TOS) is a hardware-centric framework devised for accelerating Threshold-Ordinal Surface (TOS) updates used in corner detection tasks for Event-based Cameras (EBCs). By integrating a read-write decoupled 8-transistor (8T) SRAM cell architecture, row-level pipelining, and dynamic voltage and frequency scaling (DVFS), NM-TOS delivers per-event threshold updates with substantially reduced latency and energy, while sustaining robust corner detection accuracy—even under aggressive voltage scaling. These properties make NM-TOS particularly suited for low-power, high-throughput applications on edge devices where rapid event-driven computation is essential (Shang et al., 2 Dec 2025).
1. System Pipeline and Architectural Overview
NM-TOS operates in a multi-stage processing pipeline that transforms raw event streams from an Address-Event Representation (AER) sensor into corner classifications. Events, represented as , are initially filtered by a Spatio-Temporal Correlation Filter (STCF) to attenuate isolated noise. In parallel, a lightweight controller executes a dynamic event-rate measurement using a three-counter sliding window, calculating the short-term event throughput . This rate parameter guides a lookup table (LUT) responsible for selecting the optimal supply voltage and clock frequency for downstream processing.
Subsequent to denoising and dynamic adaptation, the NM-TOS core executes an event-by-event (EBE) TOS update over a spatial neighborhood ( by default). Once updated, the TOS surface is input to a frame-by-frame (FBF) Harris LUT, classifying corners at candidate event positions.
Pipeline Overview
| Stage | Function | Output |
|---|---|---|
| STCF denoising | Suppress isolated noise in event stream | Filtered events |
| DVFS controller | Adapt , | Speed/energy config |
| NM-TOS patch update | EBE TOS update over | Refreshed TOS |
| Harris LUT | Classify corners | Corner outputs |
This dataflow enables real-time, adaptive corner detection with minimized latency and energy overhead (Shang et al., 2 Dec 2025).
2. 8T Read-Write-Decoupled SRAM Cell Structure
Fundamental to NM-TOS is the employment of a physically isolated 8T SRAM cell ("type A"), distinguishing the Read BitLine (RBL) and Write BitLine (WBL) via dedicated NMOS access transistors (M_A1/M_A2 for RBL; M_W1/M_W2 for WBL). The standard six elements configure back-to-back inverters to retain the storage state.
This architecture allows simultaneous read/write operations: reading from row while writing to row . Such decoupling eliminates critical path dependencies, supporting a pipelined update strategy rather than sequential per-row read-compute-write cycles.
Key cell features:
- Dedicated access transistors per read/write vector.
- Full retention using conventional cross-coupled inverter pairs.
- Functional completeness and cell robustness validated via 65 nm CMOS SPICE simulations, with zero bit error rates at and controlled performance degradation below this voltage (Shang et al., 2 Dec 2025).
3. Row-Level Pipelining and Timing Optimization
Patch updates in TOS require sequential manipulation of rows per input event. Traditional methods incur latency:
with representing precharge, read/“minus-one”, compare, and write-back delays. By leveraging decoupled RBL/WBL circuitry, these four phases are organized in a classic four-stage pipeline:
This reduces row update costs to after an initial prologue. For in a 65 nm implementation, pipelined operation achieves approximately latency at , equating to a throughput of —a 24.7× speedup over conventional serial digital implementations (~392 ns).
4. Hardware–Software Co-Optimization and DVFS Integration
The data-dependent nature of EBC throughput motivates real-time adaptation of power-performance envelopes. NM-TOS employs a three-counter sliding window (window , stride ) to derive and select configuration parameters via LUT. Dynamic energy per update adheres to:
where is net capacitance and is switching activity. Thus, reducing from to can, in principle, yield a 0.25× energy reduction. With clock adaptation, actual savings reach up to 6.6× in measured implementations.
5. Quantitative Performance and Robustness Characterization
Performance characterization of NM-TOS utilizes 65 nm CMOS SPICE benchmarks with patch size:
- Latency: At , serial digital patch update incurs 392 ns; NM-TOS pipelined yields 16 ns.
- Throughput: Conventional digital methods achieve ; NM-TOS pipelined supports .
- Energy: Per patch update energy is ([email protected]), ([email protected]), ([email protected]).
- Bit Error Rate (BER): Robust operation ( BER) at ; BER rises to at and at .
Since only the three most significant stored bits (levels 8:5) are utilized, the impact of BER on practical corner detection outcome is marginal (Shang et al., 2 Dec 2025).
6. Corner Detection Accuracy and Application Impact
Corner detection performance was evaluated using precision-recall AUC on two Prophesee datasets ("shapes_dof" and "dynamic_dof"). Even under worst-case BER ( at ), the degradation in detection is minor:
- “shapes_dof”: AUC ≈ 0.027
- “dynamic_dof”: AUC ≈ 0.015
This signifies that hardware-induced imperfections, even under aggressive DVFS, incur negligible real-world reduction in event-based corner detection quality.
7. Significance and Future Perspectives
NM-TOS establishes a template for integrating near-memory architectures, pipelined microarchitectures, and adaptive DVFS in resource-constraint environments demanding rapid response to event-based sensory data. The achieved combination—8T SRAM topology, pipelined patch updates, peripheral co-optimization, and flexible power scaling—demonstrates a practical pathway to bridge algorithmic advances in event-driven computer vision and hardware limitations of edge deployment. A plausible implication is the viability of NM-TOS strategies in broader contexts involving patch-based updates of ordinal surfaces or similar representations in real-time applications.
For comprehensive method details and implementation, see (Shang et al., 2 Dec 2025).