Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 89 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 27 tok/s

GPT-5 High 22 tok/s Pro

GPT-4o 89 tok/s

GPT OSS 120B 457 tok/s Pro

Kimi K2 169 tok/s Pro

2000 character limit reached

Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform (2507.23562v1)

Published 31 Jul 2025 in cs.LG and cs.AR

Abstract: Spiking Neural Networks (SNNs) promise orders-of-magnitude lower power consumption and low-latency inference on neuromorphic hardware for a wide range of robotic tasks. In this work, we present an energy-efficient implementation of a reinforcement learning (RL) algorithm using quantized SNNs to solve two classical control tasks. The network is trained using the Q-learning algorithm, then fine-tuned and quantized to low-bit (8-bit) precision for embedded deployment on the SpiNNaker2 neuromorphic chip. To evaluate the comparative advantage of SpiNNaker2 over conventional computing platforms, we analyze inference latency, dynamic power consumption, and energy cost per inference for our SNN models, comparing performance against a GTX 1650 GPU baseline. Our results demonstrate SpiNNaker2's strong potential for scalable, low-energy neuromorphic computing, achieving up to 32x reduction in energy consumption. Inference latency remains on par with GPU-based execution, with improvements observed in certain task settings, reinforcing SpiNNaker2's viability for real-time neuromorphic control and making the neuromorphic approach a compelling direction for efficient deep Q-learning.

Collections

Summary

The paper introduces an energy-efficient deep RL method using quantized spiking neural networks on SpiNNaker2, achieving significant energy savings compared to GPUs.
The methodology incorporates surrogate gradient backpropagation and 8-bit weight quantization to enable real-time control in tasks like CartPole and Acrobot.
Results demonstrate up to 32× energy consumption reduction while maintaining comparable inference latency and performance on neuromorphic hardware.

Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform

This paper presents an energy-efficient implementation of a reinforcement learning algorithm using quantized spiking neural networks (SNNs) on the SpiNNaker2 neuromorphic hardware platform. The primary focus is on training deep spiking Q-networks (DSQNs) for robotic control tasks, achieving significant reductions in energy consumption and maintaining comparable inference latency to GPU-based systems.

Introduction

Spiking Neural Networks (SNNs) are compelling alternatives to traditional artificial neural networks (ANNs), leveraging asynchronous updates and discrete spikes to reduce power consumption and computational load. Neuromorphic platforms like SpiNNaker2 exploit these attributes, offering scalable and real-time performance in energy-constrained applications. Despite theoretical alignment, deploying RL algorithms with SNNs remains a complex challenge due to the tight coupling between algorithmic design and hardware constraints.

The authors explore deploying SNN-based Deep Q-Learning on SpiNNaker2, showcasing its potential for low-power, real-time control across various tasks.

Figure 1: Pipeline Overview: Closed-loop SNN-based RL with SpiNNaker2.

Methodology

Model Architecture

The authors detail the architecture for training DSQNs using Q-learning, which involves surrogate gradient backpropagation in spiking networks. The architecture consists of leaky integrate-and-fire (LIF) neurons interleaved in fully connected layers. The dynamic evolution of membrane potentials is governed by:

$u^{j}_{t+1} = \beta u^{j}_{t} + \sum_i w^{ij} z^{i}_{t} - z^{j}_{t} \theta$

where $u^j_t$ represents the membrane potential, $\beta$ is the decay factor, $w^{ij}$ are the synaptic weights, and $\theta$ is the firing threshold.

The architecture accommodates signed input encoding to represent continuous observations from environments like CartPole and Acrobot. State vectors are encoded into spike trains using rate coding, which is compatible with SpiNNaker2's hardware capabilities.

Model Quantization

The quantization strategy involves mapping floating-point weights to 8-bit integers post-training, with necessary scaling factors to maintain precision. Empirical testing determined scaling factors ( $\lambda$ ) that preserve dynamic range without saturation, ensuring robust signal propagation post-quantization.

Figure 2: Effect of Uniform Full-Layer Quantization Scaling in CartPole-v0 and Acrobot-v1.

Encoding and Decoding Strategies

For input encoding, features are transformed into a two-neuron format representing positive and negative values. Rate coding then translates these values into spike trains. Output selection relies on reading the final membrane potentials of output neurons, with spikes inhibited by setting high firing thresholds.

Figure 3: Voltage dynamics in CartPole-v0 and Acrobot-v1 recorded for one episode on SpiNNaker2.

Results

Quantization strategy ablation confirmed optimal scaling factors, showing negligible performance degradation for CartPole-v0 and a slight improvement for Acrobot-v1 on SpiNNaker2. On-chip threshold sensitivity analysis highlights a sequential tuning procedure for optimal LIF thresholds.

Energy consumption benchmarks against the GTX 1650 GPU demonstrated substantial efficiency gains, with up to 32× reduction in energy usage on SpiNNaker2. Comparative latency and power metrics align with real-time control needs, indicating neuromorphic platforms are viable for efficient deep RL.

Figure 4: Average rewards over 5 episodes for various hidden layer thresholds.

Figure 5: Spike activity for CartPole-v0 and Acrobot-v1 across input and hidden layers.

Discussion and Conclusion

The research successfully integrates deep RL with SNNs, deploying quantized DSQN models on the SpiNNaker2 chip, preserving task performance and achieving energy-efficient real-time closed-loop execution. The findings suggest SpiNNaker2 as a promising neuromorphic platform for deploying intelligent agents in energy-constrained environments, such as mobile robots and IoT devices.

Future work will investigate in-loop training and broader applications, enhancing the scalability and real-world deployment potential of neuromorphic RL systems. This approach contributes to developing intelligent systems capable of operating efficiently under stringent resource constraints.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (3)

YouTube

Show All Videos

alphaXiv

Hardware-Aware Fine-Tuning of Spiking Q-Networks on the SpiNNaker2 Neuromorphic Platform (1 like, 0 questions)