Stable Adaptive PGD for SNN Robustness

Updated 3 January 2026

SA-PGD is an adversarial attack optimization framework that combines momentum, adaptive per-coordinate updates, and ℓ∞ clipping to robustly assess the vulnerability of spiking neural networks.
It enhances convergence and stability by normalizing gradients and dynamically tuning step sizes, effectively mitigating issues from noisy or vanishing gradients.
Empirical results demonstrate that pairing SA-PGD with Adaptive Sharpness Surrogate Gradient achieves significantly higher attack success rates, boosting performance across diverse SNN architectures and datasets.

Stable Adaptive Projected Gradient Descent (SA-PGD) is an adversarial attack optimization framework specifically developed to evaluate the robustness of Spiking Neural Networks (SNNs) under the $\ell_\infty$ threat model. SA-PGD integrates momentum, per-coordinate adaptive step sizes, and per-step $\ell_\infty$ clipping, enhancing iteration stability and convergence speed under highly noisy or vanishing gradient regimes intrinsic to SNNs. When paired with Adaptive Sharpness Surrogate Gradient (ASSG), SA-PGD enables substantially more reliable estimation of adversarial vulnerability compared to classic projected gradient descent (PGD) or Adam-style PGD implementations (Wang et al., 27 Dec 2025).

1. Motivation and Background

Adversarial robustness evaluation of SNNs is challenged by the discontinuous Heaviside step function $\displaystyle H(x)=\begin{cases}1,&x\ge0\0,&x<0\end{cases}$ governing spike generation. The derivative $\displaystyle dH/dx$ vanishes almost everywhere, impeding gradient-based white-box adversarial attacks. Surrogate gradient methods replace $dH/dx$ with differentiable approximations, but standard surrogates (e.g., Atan, piecewise linear) are susceptible to vanishing gradients, particularly as surrogate “sharpness” increases to better approximate $H$ . Even with improved surrogates, adversarial optimization remains hampered by noise and instability, motivating the development of SA-PGD as a stabilizing attack optimizer working in conjunction with dynamically-tuned ASSG (Wang et al., 27 Dec 2025).

2. SA-PGD Formalism and Algorithm

At each iteration $k$ , SA-PGD operates on the gradient $g_k = \nabla_x L(f_\theta(x_k), y)$ with the following steps:

Momentum update:

$m_k = \beta_m \cdot m_{k-1} + (1 - \beta_m) \frac{g_k}{\|g_k\|_1}$

Adaptive variance update:

$v_k = \beta_v \cdot v_{k-1} + (1 - \beta_v) \frac{g_k^2}{\|g_k^2\|_1}$

Here, $g_k^2$ is computed elementwise.

Per-coordinate adaptive update with $\ell_\infty$ clipping:

$t_k = \mathrm{clip}\left(\frac{m_k}{\sqrt{v_k}+\xi}\,\eta_k,\, -\eta_k,\, \eta_k\right)$

where $\eta_k$ is a possibly decaying step size and $\xi \ll 1$ provides numerical stability.

Projection to $\ell_\infty$ threat region:

$x_{k+1} = \Pi_\varepsilon^\infty(x_k + t_k)$

where $\Pi_\varepsilon^\infty$ is the projection operator to the $\ell_\infty$ -norm $\varepsilon$ -ball.

Momentum coefficients $\beta_m, \beta_v \in [0, 1)$ control the degree of temporal averaging. Both normalization ( $L_1$ of gradients and squared gradients) and adaptive step calculation are essential for effective progress in noisy SNN landscapes even under imprecise or vanishing surrogates (Wang et al., 27 Dec 2025).

3. Stability and Convergence Characteristics

Empirical evidence demonstrates that SA-PGD:

Achieves higher attack success rates (ASR) with fewer iterations than both vanilla PGD and Adam-style optimizers lacking $\ell_\infty$ -clipped per-step updates.
Maintains stability when gradients are unstable or vanish due to sharp surrogates, by adaptively reducing or amplifying local update magnitude under explicit control.
The per-step $\ell_\infty$ clipping enforces the optimizer’s trust region, preventing “wild” coordinate jumps that might otherwise leave the feasible adversarial set defined by the threat model (Wang et al., 27 Dec 2025).

This design is particularly salient for SNNs, where the loss surface can exhibit abrupt discontinuities or dead zones, and where surrogate gradient dynamics alone are insufficient for attack convergence. Momentum combined with adaptive rates substantially mitigates oscillation and poor gradient signal propagation.

4. Integration with Adaptive Sharpness Surrogate Gradient (ASSG)

SA-PGD’s efficacy is maximized when paired with ASSG, which dynamically adjusts the “sharpness” parameter $\alpha_{i,t}^l(k)$ of the Atan surrogate gradient for each neuron, layer, and time based on running estimates of local membrane-potential statistics:

For neuron $i$ $i$ , layer $l$ $l$ , time $t$ $t$ at iteration $k$ $k$ :
- Compute offset $u_{i,t}^l(x_k)$ and update EMA buffers $M_{i,t}^l(k), D_{i,t}^l(k)$ .
- Set sharpness:
$\alpha_{i,t}^l(k) = \frac{2}{\pi [M_{i,t}^l(k) + \gamma D_{i,t}^l(k)]} \tan\left(\frac{\pi A}{2}\right)$

where $A \in [0,1]$ is the target vanishing degree, $\gamma \geq 0$ a relaxation parameter.

ASSG mitigates both under- and over-vanishing in surrogate gradients by aligning sharpness with the current distribution of membrane offsets, strengthening the underlying PGD signal for SA-PGD (Wang et al., 27 Dec 2025).

5. Empirical Performance and Experimental Results

On a diverse suite including CIFAR-10, CIFAR-100, and CIFAR10-DVS datasets, under multiple adversarial training regimes (AT, RAT, AT+SR, TRADES) and SNN architectures (e.g., SEWResNet31/43) with various neuron models (LIF-2, IF, PSN, Poisson):

ASSG+APGD improves absolute attack success rates by 8–10 percentage points over static surrogates (e.g., from ~75% to ~84% ASR on CIFAR-10-AT) (Wang et al., 27 Dec 2025).
SA-PGD, when replacing APGD (i.e., ASSG+SA-PGD), provides further gains of 4–5 percentage points (up to ~88% ASR).
On deeper or more challenging network/neuron configurations, ASSG+SA-PGD achieves consistent and significant improvements (e.g., ~98% ASR on PSN models vs. ~78% for static surrogates).
Similar improvements are observed across CIFAR-100 and DVS datasets (4–6 percentage points superior).

Performance comparisons:

Method	Dataset	Baseline ASR (%)	With ASSG+SA-PGD ASR (%)
STBP+APGD	CIFAR-10	~75	~88
Static Surrogate	PSN	~78	~98

All values are from (Wang et al., 27 Dec 2025).

6. Practical Considerations and Trade-offs

ASSG requires tracking per-neuron, per-time-step sharpness parameters ( $\alpha_{i,t}^l$ ), resulting in ~10–30% runtime and memory overhead, but remains tractable for modern GPU hardware.
SA-PGD is robust within $A\in[0.82,0.9]$ for the target vanishing degree and requires tuning of momentum rates ( $\beta_1,\beta_2$ ).
Large iteration budgets (≥1,000) are often necessary due to SNN adversarial optimization’s intrinsic difficulty.
The combined ASSG+SA-PGD framework exposes overestimation of SNN adversarial robustness when traditional attacks are used, indicating a need for improved adversarial training methods (Wang et al., 27 Dec 2025).

7. Significance and Broader Context

SA-PGD extends standard adversarial attack protocol for SNNs, overcoming the limitations of both fixed-sharpness surrogates and step-size-agnostic optimizers. Its stabilization mechanisms address fundamental sources of optimizer inefficiency in discontinuous, high-dimensional SNN loss landscapes. The resulting evaluation more reliably establishes the true adversarial vulnerability of current SNNs, which has been significantly underestimated by earlier approaches. Adoption of SA-PGD, in conjunction with adaptive surrogate gradients, provides a new standard of reliability for adversarial robustness assessment in spike-based neuromorphic architectures (Wang et al., 27 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Towards Reliable Evaluation of Adversarial Robustness for Spiking Neural Networks (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Stable Adaptive Projected Gradient Descent (SA-PGD).