Papers
Topics
Authors
Recent
Search
2000 character limit reached

Noisy Spiking Actor Networks

Updated 2 March 2026
  • The paper demonstrates NoisySAN's integration of time-correlated noise to enhance exploration in spiking actor networks for deep reinforcement learning.
  • It introduces a novel regularization mechanism that balances noise injection with learning stability, achieving state-of-the-art results on continuous control benchmarks.
  • Empirical evaluations reveal that NoisySAN outperforms traditional spiking network methods and conventional deep actor networks, reflected by a significant APR improvement.

Noisy Spiking Actor Networks (NoisySAN) integrate explicit noise injection into spiking neural network (SNN) architectures for deep reinforcement learning (RL), specifically addressing the challenge of exploration. While prior SNN-based actor networks exhibit robust spike-based computation, their intrinsic robustness to stochastic perturbation impedes effective exploration. NoisySAN counteracts this by introducing time-correlated noise at the subthreshold charging and transmission stages of each spiking neuron, combined with a novel noise reduction mechanism to stabilize learned policies. Empirical evaluation demonstrates that NoisySAN achieves state-of-the-art performance on continuous control benchmarks, outperforming both conventional SNN approaches and standard deep actor networks (Chen et al., 2024).

1. Spiking Neuron Formulation and Dynamics

NoisySAN is constructed atop a parametric spiking neuron framework. The fundamental SNN unit operates as follows:

  • Subthreshold (charging) update:

Ht=f(Vt1,Xt)H_{t} = f(V_{t-1}, X_{t})

For current-based leaky integrate-and-fire (LIF) neurons:

Ht=αVVt1+CtH_t = \alpha_V V_{t-1} + C_t

  • Spike emission:

St=Θ(HtVth),Θ(x)={1,x0 0,x<0S_t = \Theta(H_t - V_{th}), \quad \Theta(x) = \begin{cases}1, &x \ge 0 \ 0, &x < 0\end{cases}

Vt=Ht(1St)+VresetStV_t = H_t (1 - S_t) + V_{\text{reset}} S_t

  • Integrate-and-fire-free (non-spiking) neuron:

Vt=f(Vt1,Xt)V_t = f(V_{t-1}, X_t)

The binary nature of StS_t (spiking output) underpins SNNs’ resistance to injected noise.

2. Time-Correlated Noise Injection Mechanism

NoisySAN introduces two independent, time-correlated (“colored”) noise streams per neuron:

  • Noise in subthreshold update:

Ht=f(Vt1,Xt)+σvεv(t)H_t = f(V_{t-1}, X_t) + \sigma_v \odot \varepsilon_v(t)

  • Noise in spike transmission:

S~t=St+σsεs(t)\widetilde{S}_t = S_t + \sigma_s \odot \varepsilon_s(t)

where both εv(t)\varepsilon_v(t) and εs(t)\varepsilon_s(t) are sampled from a colored noise process defined by

EF{ε(t)}(f)2fβ,β[0,2]\mathbb{E}\left| \mathcal{F}\{\varepsilon(t)\}(f) \right|^2 \propto f^{-\beta}, \quad \beta \in [0,2]

(β=0\beta=0 for white, β=1\beta=1 for pink, β=2\beta=2 for red noise).

This approach systematically perturbs neuronal integration and spike transfer, providing denser exploration in the policy’s action space compared to local noise or fixed parameter perturbations. For integrate-and-fire-free neurons, only the σvεv(t)\sigma_v \odot \varepsilon_v(t) noise is injected.

3. Noise Reduction and Stable Policy Learning

To prevent excessive variance in finalized policies, a regularization penalty on noise parameters is incorporated. The loss function is thus

Lnew=Lold+kNAi=1NAσi2\mathcal{L}_{\text{new}} = \mathcal{L}_{\text{old}} + \frac{k}{N_A}\sum_{i=1}^{N_A}\sigma_i^2

where

  • Lold\mathcal{L}_{\text{old}}: Standard actor-critic loss
  • NAN_A: Action dimension
  • kk: Dynamically scaled by policy performance: k=k0RevalRminRmaxRmink = k_0\frac{R_{\text{eval}} - R_{\min}}{R_{\max} - R_{\min}} with RevalR_{\text{eval}} the periodically measured evaluation return.

This enforces a curriculum in noise attenuation: initial exploration benefits from higher noise magnitudes, while policy convergence proceeds under gradually reduced exploration stochasticity.

4. Architecture: Components and Parameterization

NoisySAN comprises multiple structured layers:

  • Input encoding: Each state dimension (NN) is population-coded into PinP_{\text{in}} spike trains; total input neurons is NPinN \cdot P_{\text{in}}.
  • Backbone SNN: Two fully connected noisy CLIF (current-based leaky integrate-and-fire) layers ($256$ neurons each).
    • Each layer ll evolves as:

    Ctl=αCCt1l+WlS~tl1,Htl=αVVt1l+Ctl+σvlεvl(t)C^l_t = \alpha_C C^l_{t-1} + W^l \widetilde{S}^{l-1}_t, \quad H^l_t = \alpha_V V^l_{t-1} + C^l_t + \sigma_v^l \odot \varepsilon_v^l(t)

    Spike emission and membrane reset follow the rules above.

  • Output layer: MPoutM \cdot P_{\text{out}} output neurons partitioned into MM populations; intra-layer connectivity is permitted.

  • Population decoder: MM integrate-and-fire-free neurons, each integrating spike counts over TT steps and emitting a continuous action via aVTa \equiv V_T.

Trainable weights are:

  • Input\rightarrowCLIF1: (NPin)×256(N P_{\text{in}}) \times 256

  • CLIF1\rightarrowCLIF2: 256×256256 \times 256

  • CLIF2\rightarrowOutputSpiking: 256×(MPout)256 \times (M P_{\text{out}})

  • Decoder: (MPout)×M(M P_{\text{out}}) \times M

  • Per-layer noise parameter vectors σv,σs\sigma_v, \sigma_s and respective biases

5. Training Procedure and Integration with TD3

NoisySAN is trained in a deep RL actor-critic framework, building on the Twin Delayed Deep Deterministic Policy Gradients (TD3) algorithm. The core steps are:

  1. Initialize all actor (NoisySAN) and critic parameters, including noise scales (σ\sigma) and target networks.

  2. Per episode:

    • Sample full noise sequences {εv,εs}\{\varepsilon_v, \varepsilon_s\} for the entire episode.
    • For each timestep tt:
      • Encode observation xtx_t, infer spikes and action ata_t via noisy SNN (using equations above).
      • Apply target-policy smoothing, execute action, receive transition (xt,at,rt,xt+1)(x_t, a_t, r_t, x_{t+1}).
      • Store state, action, noise, and outcome in the replay buffer.
    • If update-ready, sample minibatches, compute TD-targets for the critic, and update critic/actor networks.
    • Actor loss includes both negative Q-value and the variance penalty.
    • Target networks undergo soft updates.

This sequence aligns with the published pseudocode for the method (Chen et al., 2024).

6. Empirical Results and Benchmark Comparison

NoisySAN was evaluated on a suite of continuous control tasks from OpenAI Gym using MuJoCo backends: Ant-v3, HalfCheetah-v3, Hopper-v3, Walker2d-v3, Humanoid-v3, HumanoidStandup-v2, InvertedDoublePendulum-v2, and BipedalWalker-v3. Key measurements include maximum average return (over 10 random seeds, 1 M steps, and deterministic evaluation every 10 K steps) and the APR (Average Performance Ratio) with respect to a deep actor network (DAN):

APR(A)=1TτTPref(A,τ)Pref(DAN,τ)\mathrm{APR}(A) = \frac{1}{|\mathcal{T}|} \sum_{\tau \in \mathcal{T}} \frac{\mathrm{Pref}(A, \tau)}{\mathrm{Pref}(\mathrm{DAN}, \tau)}

Table: Selected Results (Max Average Return ± std)

Task DAN PopSAN MDC-SAN ILC-SAN NoisySAN
Ant-v3 5472±653 5264±920 5311±806 5339±503 5524±415
HalfCheetah 10471±1695 9419±1600 10323±1670 10789±922 10723±817
Hopper 3520±105 230±52 1824±1738 3125±1096 3356±652
... ... ... ... ... ...
APR (%) 100.0% 97.3% 102.3% 107.2% 116.6%

NoisySAN achieves an overall APR of 116.6%, providing consistent gains over all prior SNN-based actor variants ("PopSAN," "MDC-SAN," "ILC-SAN") and matching or exceeding the performance of deterministic deep actor networks in most experiments.

7. Significance, Limitations, and Implications

NoisySAN demonstrates that explicit, time-correlated stochasticity in spiking actor networks—if carefully regularized—can break through the intrinsic noise robustness of SNNs and provide both effective exploration and stable learning in deep RL. This approach establishes a new bridge between exploration-by-parameterization (as in NoisyNet) and spike-driven computation. The introduction of colored noise (parameterized by β\beta exponent) allows tunable temporal structure in explorative perturbations. A plausible implication is that further refinements in the coupling of neuronal stochasticity with policy regularization may yield additional gains in sample efficiency and robustness in neuromorphic RL systems (Chen et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Noisy Spiking Actor Networks (NoisySAN).