IzhiRISC-V: RISC-V for Neuromorphic SNNs
- IzhiRISC-V is a RISC-V-compliant processor architecture featuring a custom instruction set for efficient spiking neural network computation based on the Izhikevich neuron model.
- It integrates a Neuron Processing Unit (NPU) and a Decay Unit (DCU) directly into the pipeline to achieve single-cycle state updates and synaptic decay with fixed-point arithmetic.
- Performance benchmarks show enhanced compute density and energy efficiency, demonstrating effective multi-core scaling and minimized pipeline hazards.
IzhiRISC-V is a RISC-V-compliant processor architecture incorporating a custom instruction set extension designed for efficient spiking neural network (SNN) computation, with particular emphasis on the Izhikevich neuron model. It features a deeply integrated Neuron Processing Unit (NPU) and Decay Unit (DCU) that augment the baseline integer pipeline to accelerate neuron state updates and synaptic current decay with single-cycle, fixed-point hardware instructions, significantly increasing compute density and energy efficiency for neuromorphic workloads (Szczerek et al., 18 Aug 2025).
1. Processor Architecture and Pipeline Integration
The IzhiRISC-V is derived from the DTEK-V baseline core, implementing the RV32IMZ instruction set (combining RV32I, M, and Zicsr). The core's pipeline is organized into three stages:
- Fetch+Decode (IF/ID): Instruction fetch and decode are merged into a single stage.
- Execute (EX): The main execution stage where both standard and neuromorphic instructions are dispatched.
- Memory+Write-Back (MEM/WB): Memory access and data write-back are combined.
A forwarding unit mitigates read-after-write (RAW) hazards by propagating results back to the EX stage as needed, with stall cycles inserted only when unavoidable.
Neuromorphic enhancements are realized via hardware extensions directly merged into the pipeline's execution path:
- The NPU and DCU are tightly grafted into the ALU datapath, sharing operand multiplexers and participating equally in the pipeline flow.
- IF/ID recognizes neuromorphic instructions via the RISC-V custom-0 opcode ().
- In the EX stage, the hazard unit examines to route to one of: standard ALU, NPU, or DCU.
- Results from the NPU/DCU are written to registers or main memory based on instruction semantics.
This integration maintains RISC-V programming conventions, removing the need for context switches to co-processors, while allowing fast hardware-accelerated SNN computations.
2. Custom ISA Extension: Neuromorphic Instructions
IzhiRISC-V defines a dedicated ISA extension using the RISC-V "custom-0" opcode format for neuromorphic processing, particularly optimized for Izhikevich neuron simulation. Four new instructions are introduced:
| Instruction | funct3 | Format | Semantics |
|---|---|---|---|
| nmldl | 000 | R | Load Izhikevich parameters , , , into NPU configuration registers. Operands: , (Q4.11 fixed-point), (Q7.8), (Q4.11). |
| nmldh | 001 | R | Load time-step and clamp-voltage flag into NPU configuration (-bit selects step size, -bit sets voltage clamp behavior). |
| nmpn | 010 | N | Execute single-cycle forward Euler update for neuron state using NPU. Inputs: address of state, (Q15.16). Outputs updated and a spike flag. |
| nmdec | 011 | R | Perform single-cycle synaptic current exponential decay; parameterized by and approximated via shift-and-add in the DCU. |
All instructions follow R-type or N-type encodings, with the nmpn instruction using the destination register as both source ( word) and for the spike flag return.
3. Izhikevich Neuron Model: Hardware Mapping
The NPU specializes in fixed-point implementations of the Izhikevich neuron model as described by the following equations:
Continuous time: with spike and reset applied when :
Discretized for hardware as:
The NPU implements:
- Quadratic and linear terms using pipelined multiplies and accumulator units, with Q7.8 and Q4.11 fixed-point formats to balance range and resolution.
- Reset logic, which checks for threshold crossing and applies reset in the same cycle, setting an LSB spike flag.
- Exponential synaptic decay via DCU, using shift-and-add methods with low error (e.g., division by 2, 3, 7, or 8 yields errors under 0.4%), as tabulated below.
| Division | Approximation | Approx. Error |
|---|---|---|
This mapping enables tight hardware loops for neuron evolution with robust numerical fidelity versus double/fixed MATLAB baselines.
4. Microarchitectural Details and Physical Resource Utilization
ALU augmentation integrates the NPU and DCU as functional units within the EX stage. Configuration registers @ NM_REGS track , , , , , , and the clamp flag. RTL is implemented in VHDL, using IEEE fixed-point (sfixed); resource optimizations (e.g., multiplier sharing) are intentionally omitted to maximize computation fidelity.
Synthesized utilization on Intel MAX10 FPGA (10M50DAF484C7G, dual-core @30 MHz):
| Metric | Utilization |
|---|---|
| Logic elements | 49,248 (99%) |
| Flip-Flops | 28,235 (51%) |
| BRAM | 346.5 Kb (21%) |
| 9-bit multipliers | 68 (24%) |
Scalability projections on Intel Agilex-7 (100 MHz) suggest:
- 16 cores: 8% ALMs, 152 DSPs
- 32 cores: 17% ALMs, 304 DSPs
- 64 cores: 32% ALMs, 608 DSPs
After mapping to standard cells:
- FreePDK45 (45 nm): 201.5 MHz, 67.6 M neuron-updates/s at 49.5 mW ($1.37$ GUpd/s/W), NPU area ≈ 20%, DCU < 2%
- ASAP7 (7nm): 316.3 MHz, 105.4 M neuron-updates/s at 10.9 mW ($9.67$ GUpd/s/W)
5. Benchmarking: Performance and Energy Efficiency
Benchmarks incorporate both synthetic networks and application-driven scenarios.
80-20 network (1,000 neurons, 1,000 timesteps, ms):
- Single-core: 7.87 s ($127,000$ neuron-updates/s).
- Dual-core: 4.79 s ($209,000$ neuron-updates/s, speedup).
- Effective IPC: 0.65 (ideal 1.0 without custom ops).
- Cache hit rates: I-cache 99.97%, D-cache 96.5–97.2%.
- Hazard stalls: 0.74% (single-core), 5.34–6.26% (dual-core).
| Metric | Single-core | Dual-core (each) |
|---|---|---|
| Execution time [s] | 7.870 | 4.791 |
| Speed-up | 1.00× | 1.643× |
| IPC | 0.574 | 0.532–0.519 |
| Effective IPC | 0.652 | 0.664–0.651 |
ISI distributions and spike rasters align with MATLAB double/fixed-point references, confirming numerical accuracy.
Sudoku Winner-Take-All (WTA) network (729 neurons):
- Timestep latency: 2.06 ms (single-core), 1.22 ms (dual-core, speedup).
- Speed-up over soft-float DTEK-V: ~40×.
- D-cache hit: 100%.
| Metric | Single-core | Dual-core |
|---|---|---|
| Timestep latency [ms] | 2.0555 | 1.2223 |
| Speed-up | 1.00× | 1.682× |
| IPC (avg) | 0.530 | 0.496–0.419 |
| Effective IPC (avg) | 0.756 | 0.864–0.787 |
| D-cache hit rate | 100% | 100% |
ASIC implementations:
- FreePDK45: $1.37$ GUpd/s/W at 49.5 mW
- ASAP7: $9.67$ GUpd/s/W at 10.9 mW
6. Limitations and Prospective Enhancements
IzhiRISC-V demonstrates substantial acceleration of Izhikevich neuron networks, with the translation of a 19-operation software kernel into a single-cycle custom instruction, maintaining the RISC-V execution model. Numerical fidelity is preserved, indicated by comparable interspike interval histograms and spike rasters with reference simulations.
Noted limitations include:
- Pipeline hazards induced by neuromorphic instructions, preventing ideal IPC () due to extra stalls.
- Potential need for fixed-point retuning for specific biological or computational regimes.
- Multi-core scaling is curtailed by memory bus contention and cache miss penalties.
Future directions cited in the literature are:
- CSR-based spike flag and configuration success reporting to ameliorate register hazards.
- Network-level instructions supporting operations such as sparse-spike broadcast or synapse accumulation.
- Support for additional neuron models (e.g., LIF, Adaptive Exponential IF) via NPU microcode expansion.
- Integration of lightweight on-chip routers or network-on-chip mesh for scaling core count beyond 64.
IzhiRISC-V provides an approach for coupling general-purpose processor design with domain-specific acceleration, enabling large-scale, energy-efficient neuromorphic computing within standard RISC-V platforms (Szczerek et al., 18 Aug 2025).