NoiseGrad: Perturbing Weights for Robust Explanations

Updated 26 March 2026

NoiseGrad is an explainability technique for deep neural networks that perturbs model weights to generate robust, localized saliency maps.
It works by aggregating attributions from multiple perturbed models, offering improved stability compared to input-space methods like SmoothGrad.
Empirical evaluations show enhanced localization and robustness over baselines, though careful tuning of the noise scale is essential for optimal performance.

NoiseGrad is an explainability technique for deep neural networks that enhances gradient-based attribution methods by introducing stochasticity in the network's weight space. Distinct from input-space perturbation approaches such as SmoothGrad, NoiseGrad perturbs the decision boundary itself through parameter-space noise, leading to improved saliency map localization, robustness, and method-agnostic integration within model-interpretability workflows (Bykov et al., 2021).

1. Formal Definition and Theoretical Foundations

NoiseGrad operates by creating a "cloud" of perturbed models around the original trained network. For a base network $f(x; \hat{W})$ with learned weights $\hat{W} \in \mathbb{R}^{s}$ , NoiseGrad generates $M$ perturbed parameter sets $W_i = \hat{W} \odot \eta_i$ , where $\eta_i \sim \mathcal{N}(1, \sigma_\mathrm{NG}^2 I)$ . Each $f(\cdot; W_i)$ thus defines a model with a slightly perturbed decision boundary. For any gradient-based attribution map $E(x; f)$ , NoiseGrad computes the aggregated map as

$E_\mathrm{NG}(x) = \frac{1}{M} \sum_{i=1}^M E(x; f(\cdot; W_i)).$

This model-averaged attribution reduces high-frequency artifacts associated with single-model gradients by focusing on features whose importance is robust to weight perturbation. The underlying duality is that input-space perturbations (as in SmoothGrad) and parameter-space perturbations (as in NoiseGrad) both probe for regions where the model is sensitive to changes—either in the input or in the boundary defined by the parameters (Bykov et al., 2021).

2. Algorithmic Workflow and Implementation

The NoiseGrad procedure is method-agnostic, wrapping any differentiable base Explainer $E$ (e.g., saliency, integrated gradients). The workflow for attribution generation at input $x_0$ is as follows:

Initialize an accumulator $A \leftarrow 0$ .
For $i=1$ $i = 1$ to $M$ $M$ (typically 25–50),
- Sample $\eta_i \sim \mathcal{N}(1, \sigma_\mathrm{NG}^2 I)$ .
- Compute $W_i \leftarrow \hat{W} \odot \eta_i$ (elementwise).
- Compute $a_i \leftarrow E(x_0; f(\cdot; W_i))$ .
- Update $A \leftarrow A + a_i$ .
Return $A/M$ as the NoiseGrad map.

Practical defaults are $M=25$ and $\sigma_\mathrm{NG}$ set by a 5% accuracy drop heuristic. The noise level $\sigma_\mathrm{NG}$ is critical: too small and few boundaries are crossed; too large and the attribution becomes uninformative. The recommended setting is to increase $\sigma_\mathrm{NG}$ until model accuracy on a validation set drops by approximately 5% (see Section 6). This yields effective exploration of the region near the decision boundary (Bykov et al., 2021, Bommer et al., 2023).

3. Duality With Input-Space Perturbation and FusionGrad

NoiseGrad is conceptually dual to SmoothGrad, which averages attributions over input perturbations:

$E_\mathrm{SG}(x) = \frac{1}{N} \sum_{j=1}^N E(x + \xi_j; f(\cdot; \hat{W})), \qquad \xi_j \sim \mathcal{N}(0, \sigma_\mathrm{SG}^2 I).$

While SmoothGrad explores the local neighborhood of $x$ in input space thus sometimes crossing the original decision boundary, NoiseGrad perturbs the boundary itself via weight noise. This duality— $\{ x \ \mathrm{fixed},\ W\ \mathrm{noisy} \} \leftrightarrow \{ W\ \mathrm{fixed},\ x\ \mathrm{noisy} \}$ —enables complementary insights: input perturbations smooth out attribution fluctuations due to the landscape's roughness, whereas weight perturbations reveal features important to nearby boundaries (Bykov et al., 2021).

The combined method, FusionGrad, introduces both forms of stochasticity:

$E_\mathrm{FG}(x) = \frac{1}{N} \sum_{j=1}^N \frac{1}{M} \sum_{i=1}^M E(x + \xi_j; f(\cdot; W_i)),$

with noise levels typically halved relative to their optimal single-perturbation values so that the aggregate accuracy drop remains around 5%. Empirically, FusionGrad achieves the most focused and robust explanations.

4. Empirical Evaluation and Comparative Performance

NoiseGrad has demonstrated superior or comparable performance to SmoothGrad and vanilla gradient-based attributions across multiple datasets and architectures, including CMNIST (digits on random CIFAR-10 backgrounds), PASCAL VOC 2012, and ILSVRC-15. Key evaluated metrics (for ResNet9 on CMNIST, using saliency maps) include:

Method	Localization $\uparrow$	Faithfulness $\uparrow$	Robustness $\downarrow$	Sparseness $\uparrow$
Baseline	0.7315	0.3413	0.0763	0.6272
SmoothGrad	0.8263	0.3465	0.0590	0.5310
NoiseGrad	0.8349	0.3635	0.0224	0.5794
FusionGrad	0.8435	0.3697	0.0153	0.5721

NoiseGrad consistently outperforms both baseline and SmoothGrad on localization, faithfulness, and robustness, with only a modest trade-off in sparseness (Bykov et al., 2021). Qualitative results indicate sharper and more semantically coherent maps, including more accurate highlighting of object parts in VOC2012 and improved feature-visualization for global explanations.

In climate-science tasks (Bommer et al., 2023), NoiseGrad attains high skill in robustness (LLE 0.91, AS 0.88) and model-parameter randomization (MPT 0.94 for MLPs). However, it exhibits lower skill in faithfulness (correlation –0.05, indicating worse than random) and less improvement in complexity and localization. The method’s performance, especially randomization skill, degrades for convolutional architectures relative to MLPs.

5. Hyperparameter Selection and Practical Integration

The central hyperparameter for NoiseGrad is the noise scale $\sigma_\mathrm{NG}$ . A principled heuristic is used: increase $\sigma_\mathrm{NG}$ until the network’s validation accuracy $\mathrm{ACC}(\sigma)$ drops to $5\%$ below its original value. This threshold on the relative accuracy drop

$AD(\sigma) = 1 - \frac{\mathrm{ACC}(\sigma) - \mathrm{ACC}(\infty)}{\mathrm{ACC}(0) - \mathrm{ACC}(\infty)}$

provides a balance between sufficient boundary crossing and maintaining attribution quality. For FusionGrad, the input and model noise are set such that their combined effect achieves this same accuracy-drop criterion.

Integration is straightforward: any differentiable explanation method can be wrapped with parameter-space stochasticity, requiring only modification to weight tensors via multiplicative Gaussian noise. The computational overhead is increased by a factor of $M$ (typically $0.5$–$1$ s/image with $M = 25$ on GPU for typical vision models), remaining interactive for most use cases (Bykov et al., 2021).

6. Strengths, Limitations, and Application Context

Strengths:

High randomization skill: NoiseGrad’s attributions change sensitively when model weights are randomized layerwise, meaning maps reliably reflect trained parameters (Bommer et al., 2023).
Robustness: Substantial improvement in map stability under small input perturbations, comparable to or better than input-perturbation methods (LLE $0.91$ for MLPs).
Method-agnostic integration: Compatible with all gradient-based explainers; trivial to insert into legacy workflows.
Effective for both local and global explanations, offering sharper and semantically richer saliency maps.

Limitations:

Faithfulness: Low skill in explaining features causally linked to network decisions (worse than random baselines by correlation).
Complexity and Localization: Modest improvements; often yields diffuse, high-entropy attributions and limited region-of-interest focus, particularly in low-SNR tasks (e.g., climate data).
Architecture dependence: Performance degrades with CNNs, especially for randomization metrics, raising caution for convolutional architectures.
Computational cost scales linearly with sample count.

NoiseGrad extends the paradigm of perturbation-based attribution averaging, with SmoothGrad as the canonical input-space antecedent. The complementary nature of input and boundary perturbations motivated the development of FusionGrad, which combines both approaches for enhanced explanation sharpness.

Comparisons with other explainers, such as Integrated Gradients, Layer-wise Relevance Propagation (LRP), and DeepSHAP, indicate that while these methods excel in faithfulness, robustness, and localization for certain tasks, they may lack the randomization sensitivity provided by NoiseGrad. Conversely, NoiseGrad and other sensitivity-based methods are most valuable when attributions tightly bound to the exact learned parameters are desired.

NoiseGrad and its evaluation on attribution desiderata—robustness, faithfulness, randomization, complexity, and localization—provide a nuanced basis for method selection contingent on scientific objectives and task requirements in fields ranging from image processing to climate science (Bykov et al., 2021, Bommer et al., 2023).

Markdown Report Issue Upgrade to Chat

References (2)

NoiseGrad: Enhancing Explanations by Introducing Stochasticity to Model Weights (2021)

Finding the right XAI method -- A Guide for the Evaluation and Ranking of Explainable AI Methods in Climate Science (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NoiseGrad.

NoiseGrad: Perturbing Weights for Robust Explanations

1. Formal Definition and Theoretical Foundations

2. Algorithmic Workflow and Implementation

3. Duality With Input-Space Perturbation and FusionGrad

4. Empirical Evaluation and Comparative Performance

5. Hyperparameter Selection and Practical Integration

6. Strengths, Limitations, and Application Context

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

NoiseGrad: Perturbing Weights for Robust Explanations

1. Formal Definition and Theoretical Foundations

2. Algorithmic Workflow and Implementation

3. Duality With Input-Space Perturbation and FusionGrad

4. Empirical Evaluation and Comparative Performance

5. Hyperparameter Selection and Practical Integration

6. Strengths, Limitations, and Application Context

7. Related Methods and Research Context

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research