Gradient-Based Saliency Maps Explained

Updated 20 January 2026

Gradient-based saliency maps are methods that compute the derivative of class scores with respect to input features to assess feature relevance.
They leverage techniques such as SmoothGrad, Integrated Gradients, and GradCAM to reduce noise and improve the interpretability of deep network decisions.
Limitations like gradient saturation and input bias require rigorous evaluation to ensure robust, faithful explanations.

Gradient-based saliency maps provide feature-wise attributions for the decisions of deep neural networks by leveraging model gradients with respect to high-dimensional inputs, typically images. For a fixed network and class score, the saliency value at each input pixel denotes the sensitivity of the class score to infinitesimal changes at that pixel, often interpreted as a measure of input-feature relevance. These methods encompass diverse architectural and algorithmic forms—ranging from the simple input gradient, through regularized or aggregated variants, to adversarial or competitive attribution schemes—and underpin modern network interpretability science.

1. Mathematical Foundations and Standard Schemes

Gradient-based saliency maps compute, for a trained model $f:\mathbb{R}^d \to \mathbb{R}^C$ and target class $c$ , the derivative

$S_c(x) = \frac{\partial f_c(x)}{\partial x} \in \mathbb{R}^d$

for input $x \in \mathbb{R}^d$ (Simonyan et al., 2013, Adebayo et al., 2018). This raw gradient map is typically collapsed across color channels (e.g. by maximal, $L_2$ , or averaged reduction). To mitigate gradient saturation and magnify influence, the product $x_i \cdot \partial f_c(x) / \partial x_i$ ("Gradient $\odot$ Input") is often used (Adebayo et al., 2018), and for improved global attribution, Integrated Gradients (IG) integrate the gradient along a straight path between a baseline input $\bar{x}$ and $x$ (Gupta et al., 2019).

Smoothing mechanisms such as Gaussian averaging over input perturbations ("SmoothGrad") regularize high-frequency noise,

$S_{\text{SG}}(x) = \frac{1}{N}\sum_{k=1}^N S(x+g_k),\quad g_k \sim \mathcal{N}(0, \sigma^2 I)$

(Adebayo et al., 2018, Ye et al., 2024). For reinforcement learning agents, the same pipeline applies to action-value functions $Q(s,a)$ or policy log-probabilities $\log \pi(a|s)$ (Rosynski et al., 2020).

Gradient-based saliency can be computed at input or hidden layers, and generalized to composite loss functions or arbitrary outputs.

2. Aggregation, Propagation, and Class-Selective Methods

Saliency propagation and aggregation across network layers vary markedly (Khakzar et al., 2020). Positive Aggregation is the post-hoc summing or rectification (ReLU, absolute value) of gradient signals at feature or output layers:

GradCAM: aggregates $\partial f_c / \partial A^k_{ij}$ across spatial locations in feature map $A^k$ ; saliency is $S_{\text{GradCAM}}(x) = \text{ReLU}(\sum_k w_k A^k)$ where $w_k$ are global average-pooled gradients.
GradCAM++/FullGrad: aggregate ReLUed or summoned positive gradients, often ignoring signs (Khakzar et al., 2020).

Positive Propagation restricts the backward pass, e.g., Guided Backpropagation only passes positive gradients through ReLU gates, and Rectified Gradient (RectGrad) applies an additional importance threshold:

$R^{(l)}_i = \mathbb{1}[a^{(l)}_i \cdot R^{(l+1)}_i > \tau] \cdot R^{(l+1)}_i$

(Kim et al., 2019). The input-feature attribution is then $x_i R^{(1)}_i$ under RectGrad, which can introduce input bias (Brocki et al., 2020).

Class-selectivity can be achieved by competitive aggregation, e.g., CGI ("Competition for Pixels"), which assigns

$H_i(x) = \begin{cases} s^c_i & s^c_i > 0 \text{ and } s^c_i \geq \max_{k\neq c} s^k_i \ s^c_i & s^c_i < 0 \text{ and } s^c_i \leq \min_{k\neq c} s^k_i \ 0 & \text{otherwise} \end{cases}$

where $s^k_i = x_i \cdot \partial f_k(x)/\partial x_i$ (Gupta et al., 2019).

Backpropagation-based methods often employ target-selective rectification (e.g., TSGB), which adaptively enhances negative weights or propagates via forward activations for fine-grained maps (Cheng et al., 2021).

3. Regularization, Noise Suppression, and Structure Promoting Techniques

Saliency maps can exhibit high-frequency noise due to gradient discontinuities, downsampling, or propagation through irrelevant features (Kim et al., 2019). Layer-wise thresholding and smoothing are standard remedies. Smooth Deep Saliency proposes backward-hook and bilinear surrogate operations to suppress checkerboard artifacts induced by stride-$2$ convolutions, yielding smoother, more interpretable hidden-layer maps (Herdt et al., 2024):

$\widetilde{G} = \frac{1}{4}\sum_{h=0}^1 \sum_{w=0}^1 \text{roll}_{(h,w)}(G)$

where $\text{roll}_{(h,w)}$ denotes spatial shifting.

Norm-regularized adversarial training can be deployed to yield sparse or group-sparse saliency structures:

$\min_\theta \mathbb{E}_{(x,y)}[\ell(f_\theta(x),y) + \lambda h^*(\nabla_x \ell(f_\theta(x),y))]$

where $h^*$ is the Fenchel conjugate of the perturbation norm, e.g., $h^*(g) = \epsilon \|g\|_1$ for $L^\infty$ perturbations (Gong et al., 2024). Group-sparsity is induced via block norms, and elastic-net variants harmonize smoothness and sparsity.

Empirical evaluation shows such structured saliency maps achieve improved interpretability and robustness with minimal fidelity loss.

4. Black-Box Gradient Estimation and Robustness

For closed-source or black-box models (e.g., GPT-Vision APIs), gradient estimation is achieved via Likelihood Ratio (LR) methods:

$\hat{g}_c^{LR}(x) = \frac{1}{n \sigma^2} \sum_{i=1}^n f_c(x + z_i) z_i, \qquad z_i \sim N(0, \sigma^2 I)$

Blockwise variance reduction injects noise at subsets of pixels, leading to substantial estimation accuracy gains (zhang et al., 2024). These black-box estimates can be used interchangeably with standard saliency pipelines.

Empirical benchmarks demonstrate that LR-based and blockwise-LR saliency maps achieve competitive or superior insertion/deletion scores and adversarial attack transferability compared to classical white-box gradient methods.

5. Algorithmic Stability, Fidelity, and Interpretability Metrics

Saliency maps' sensitivity to network weights or training data randomness is evaluated via algorithmic stability frameworks (Ye et al., 2024). Gaussian smoothing (SmoothGrad) reduces stability error but increases fidelity error; stability improves $O(\sigma^{-1/(\beta c+1)})$ , fidelity degrades $O(\sigma)$ :

$\epsilon_{\text{stab}}(\text{SmG}) \le O(\sigma^{-1/(\beta c+1)}), \quad \epsilon_{\text{fid}}(\text{SmG}, \sigma) \sim O(\sigma)$

Empirical findings confirm the stability–fidelity trade-off on standard datasets and architectures.

Sanity checks (parameter-randomization, label-randomization) (Adebayo et al., 2018), pointing game accuracy, and insertion/deletion AUC are standard quantitative measures for interpretability (Khakzar et al., 2020, Khorram et al., 2020).

6. Limitations, Biases, and Best-Practice Recommendations

Several issues confound faithful explanation:

Positive-only aggregation (ReLU or $|\cdot|$ filter) yields maps that lack class- and weight-sensitivity, reconstructing input features rather than model-deciding regions (Khakzar et al., 2020).
Input bias arises when final attributions multiply the input feature by the backprop signal (e.g., RectGrad, LRP), underreporting relevance in low-intensity (dark or mid-gray) regions (Brocki et al., 2020).
Gradient saturation and feature interaction are addressed by decoy-based DANCE aggregation, which explores in-distribution perturbations and aggregates feature-wise saliency via the empirical range (Lu et al., 2020).

Practitioners should avoid unprincipled absolute-value filters, always retain gradient signs, run class- and model-randomization sanity checks, and select smoothing parameters proportional to interpretability goals. Adversarial regularization and instance-specific guidance (e.g., global guidance maps (Fahim et al., 2022)) further refine spatial and semantic alignment.

Gradient-based saliency maps remain the core instrument for probing neural network decision mechanisms, offering flexibility, efficiency, and extensibility across domains. However, rigorous evaluation, careful aggregation of gradient signals, and recognition of method-induced biases are essential for generating truly interpretable, faithful, and robust explanations.

Markdown Upgrade to Chat

References (15)

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps (2013)

Sanity Checks for Saliency Maps (2018)

A Simple Saliency Method That Passes the Sanity Checks (2019)

Gaussian Smoothing in Saliency Maps: The Stability-Fidelity Trade-Off in Neural Network Interpretability (2024)

Are Gradient-based Saliency Maps Useful in Deep Reinforcement Learning? (2020)

Rethinking Positive Aggregation and Propagation of Gradients in Gradient-based Saliency Methods (2020)

Why are Saliency Maps Noisy? Cause of and Solution to Noisy Saliency Maps (2019)

Input Bias in Rectified Gradients and Modified Saliency Maps (2020)

TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency (2021)

10.

Smooth Deep Saliency (2024)

11.

Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training (2024)

12.

Forward Learning for Gradient-based Black-box Saliency Map Generation (2024)

13.

iGOS++: Integrated Gradient Optimized Saliency by Bilateral Perturbations (2020)

14.

DANCE: Enhancing saliency maps using decoys (2020)

15.

Rethinking gradient weights' influence over saliency map estimation (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Gradient-Based Saliency Maps.