Noisy Spiking Actor Networks

Updated 2 March 2026

The paper demonstrates NoisySAN's integration of time-correlated noise to enhance exploration in spiking actor networks for deep reinforcement learning.
It introduces a novel regularization mechanism that balances noise injection with learning stability, achieving state-of-the-art results on continuous control benchmarks.
Empirical evaluations reveal that NoisySAN outperforms traditional spiking network methods and conventional deep actor networks, reflected by a significant APR improvement.

Noisy Spiking Actor Networks (NoisySAN) integrate explicit noise injection into spiking neural network (SNN) architectures for deep reinforcement learning (RL), specifically addressing the challenge of exploration. While prior SNN-based actor networks exhibit robust spike-based computation, their intrinsic robustness to stochastic perturbation impedes effective exploration. NoisySAN counteracts this by introducing time-correlated noise at the subthreshold charging and transmission stages of each spiking neuron, combined with a novel noise reduction mechanism to stabilize learned policies. Empirical evaluation demonstrates that NoisySAN achieves state-of-the-art performance on continuous control benchmarks, outperforming both conventional SNN approaches and standard deep actor networks (Chen et al., 2024).

1. Spiking Neuron Formulation and Dynamics

NoisySAN is constructed atop a parametric spiking neuron framework. The fundamental SNN unit operates as follows:

Subthreshold (charging) update:

$H_{t} = f(V_{t-1}, X_{t})$

For current-based leaky integrate-and-fire (LIF) neurons:

$H_t = \alpha_V V_{t-1} + C_t$

Spike emission:

$S_t = \Theta(H_t - V_{th}), \quad \Theta(x) = \begin{cases}1, &x \ge 0 \ 0, &x < 0\end{cases}$

Hard reset:

$V_t = H_t (1 - S_t) + V_{\text{reset}} S_t$

Integrate-and-fire-free (non-spiking) neuron:

$V_t = f(V_{t-1}, X_t)$

The binary nature of $S_t$ (spiking output) underpins SNNs’ resistance to injected noise.

2. Time-Correlated Noise Injection Mechanism

NoisySAN introduces two independent, time-correlated (“colored”) noise streams per neuron:

Noise in subthreshold update:

$H_t = f(V_{t-1}, X_t) + \sigma_v \odot \varepsilon_v(t)$

Noise in spike transmission:

$\widetilde{S}_t = S_t + \sigma_s \odot \varepsilon_s(t)$

where both $\varepsilon_v(t)$ and $\varepsilon_s(t)$ are sampled from a colored noise process defined by

$\mathbb{E}\left| \mathcal{F}\{\varepsilon(t)\}(f) \right|^2 \propto f^{-\beta}, \quad \beta \in [0,2]$

( $\beta=0$ for white, $\beta=1$ for pink, $\beta=2$ for red noise).

This approach systematically perturbs neuronal integration and spike transfer, providing denser exploration in the policy’s action space compared to local noise or fixed parameter perturbations. For integrate-and-fire-free neurons, only the $\sigma_v \odot \varepsilon_v(t)$ noise is injected.

3. Noise Reduction and Stable Policy Learning

To prevent excessive variance in finalized policies, a regularization penalty on noise parameters is incorporated. The loss function is thus

$\mathcal{L}_{\text{new}} = \mathcal{L}_{\text{old}} + \frac{k}{N_A}\sum_{i=1}^{N_A}\sigma_i^2$

where

$\mathcal{L}_{\text{old}}$ : Standard actor-critic loss
$N_A$ : Action dimension
$k$ : Dynamically scaled by policy performance: $k = k_0\frac{R_{\text{eval}} - R_{\min}}{R_{\max} - R_{\min}}$ with $R_{\text{eval}}$ the periodically measured evaluation return.

This enforces a curriculum in noise attenuation: initial exploration benefits from higher noise magnitudes, while policy convergence proceeds under gradually reduced exploration stochasticity.

4. Architecture: Components and Parameterization

NoisySAN comprises multiple structured layers:

Input encoding: Each state dimension ( $N$ ) is population-coded into $P_{\text{in}}$ spike trains; total input neurons is $N \cdot P_{\text{in}}$ .
Backbone SNN: Two fully connected noisy CLIF (current-based leaky integrate-and-fire) layers ($256$ neurons each).
- Each layer $l$ evolves as:
$C^l_t = \alpha_C C^l_{t-1} + W^l \widetilde{S}^{l-1}_t, \quad H^l_t = \alpha_V V^l_{t-1} + C^l_t + \sigma_v^l \odot \varepsilon_v^l(t)$

Spike emission and membrane reset follow the rules above.
Output layer: $M \cdot P_{\text{out}}$ output neurons partitioned into $M$ populations; intra-layer connectivity is permitted.
Population decoder: $M$ integrate-and-fire-free neurons, each integrating spike counts over $T$ steps and emitting a continuous action via $a \equiv V_T$ .

Trainable weights are:

Input $\rightarrow$ CLIF1: $(N P_{\text{in}}) \times 256$
CLIF1 $\rightarrow$ CLIF2: $256 \times 256$
CLIF2 $\rightarrow$ OutputSpiking: $256 \times (M P_{\text{out}})$
Decoder: $(M P_{\text{out}}) \times M$
Per-layer noise parameter vectors $\sigma_v, \sigma_s$ and respective biases

5. Training Procedure and Integration with TD3

NoisySAN is trained in a deep RL actor-critic framework, building on the Twin Delayed Deep Deterministic Policy Gradients (TD3) algorithm. The core steps are:

Initialize all actor (NoisySAN) and critic parameters, including noise scales ( $\sigma$ ) and target networks.
Per episode:
- Sample full noise sequences $\{\varepsilon_v, \varepsilon_s\}$ for the entire episode.
- For each timestep $t$ $t$ :
  - Encode observation $x_t$ , infer spikes and action $a_t$ via noisy SNN (using equations above).
  - Apply target-policy smoothing, execute action, receive transition $(x_t, a_t, r_t, x_{t+1})$ .
  - Store state, action, noise, and outcome in the replay buffer.
- If update-ready, sample minibatches, compute TD-targets for the critic, and update critic/actor networks.
- Actor loss includes both negative Q-value and the variance penalty.
- Target networks undergo soft updates.

This sequence aligns with the published pseudocode for the method (Chen et al., 2024).

6. Empirical Results and Benchmark Comparison

NoisySAN was evaluated on a suite of continuous control tasks from OpenAI Gym using MuJoCo backends: Ant-v3, HalfCheetah-v3, Hopper-v3, Walker2d-v3, Humanoid-v3, HumanoidStandup-v2, InvertedDoublePendulum-v2, and BipedalWalker-v3. Key measurements include maximum average return (over 10 random seeds, 1 M steps, and deterministic evaluation every 10 K steps) and the APR (Average Performance Ratio) with respect to a deep actor network (DAN):

$\mathrm{APR}(A) = \frac{1}{|\mathcal{T}|} \sum_{\tau \in \mathcal{T}} \frac{\mathrm{Pref}(A, \tau)}{\mathrm{Pref}(\mathrm{DAN}, \tau)}$

Table: Selected Results (Max Average Return ± std)

Task	DAN	PopSAN	MDC-SAN	ILC-SAN	NoisySAN
Ant-v3	5472±653	5264±920	5311±806	5339±503	5524±415
HalfCheetah	10471±1695	9419±1600	10323±1670	10789±922	10723±817
Hopper	3520±105	230±52	1824±1738	3125±1096	3356±652
...	...	...	...	...	...
APR (%)	100.0%	97.3%	102.3%	107.2%	116.6%

NoisySAN achieves an overall APR of 116.6%, providing consistent gains over all prior SNN-based actor variants ("PopSAN," "MDC-SAN," "ILC-SAN") and matching or exceeding the performance of deterministic deep actor networks in most experiments.

7. Significance, Limitations, and Implications

NoisySAN demonstrates that explicit, time-correlated stochasticity in spiking actor networks—if carefully regularized—can break through the intrinsic noise robustness of SNNs and provide both effective exploration and stable learning in deep RL. This approach establishes a new bridge between exploration-by-parameterization (as in NoisyNet) and spike-driven computation. The introduction of colored noise (parameterized by $\beta$ exponent) allows tunable temporal structure in explorative perturbations. A plausible implication is that further refinements in the coupling of neuronal stochasticity with policy regularization may yield additional gains in sample efficiency and robustness in neuromorphic RL systems (Chen et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Noisy Spiking Actor Network for Exploration (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Noisy Spiking Actor Networks (NoisySAN).

Noisy Spiking Actor Networks

1. Spiking Neuron Formulation and Dynamics

2. Time-Correlated Noise Injection Mechanism

3. Noise Reduction and Stable Policy Learning

4. Architecture: Components and Parameterization

5. Training Procedure and Integration with TD3

6. Empirical Results and Benchmark Comparison

7. Significance, Limitations, and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Noisy Spiking Actor Networks

1. Spiking Neuron Formulation and Dynamics

2. Time-Correlated Noise Injection Mechanism

3. Noise Reduction and Stable Policy Learning

4. Architecture: Components and Parameterization

5. Training Procedure and Integration with TD3

6. Empirical Results and Benchmark Comparison

7. Significance, Limitations, and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research