Integrated Gradients Overview

Updated 22 March 2026

Integrated Gradients is a path-based feature attribution method that computes individual feature contributions by integrating gradients along a straight-line path from a baseline to the input.
It satisfies key axioms like completeness, sensitivity, and implementation invariance, ensuring reliable and interpretable explanations for differentiable models.
Variants such as Integrated Decision Gradients and Guided IG extend its practical application by addressing challenges like saturation and baseline sensitivity across diverse data modalities.

Integrated Gradients (IG) is a path-based feature attribution method for neural networks and other differentiable models, introduced to rigorously decompose the output difference between an input and a user-defined "baseline" into per-feature contributions. IG has become a canonical technique in explainable artificial intelligence (XAI) due to its rigorous axiomatic foundations and practical flexibility. The method integrates gradients of the model output with respect to the input along a straight path from the baseline to the input, yielding attributions that satisfy completeness, sensitivity, and implementation invariance. IG has inspired a broad family of path-integral attribution techniques, several formal uniqueness results, and a range of variants adapted to different data modalities, model pathologies, and robustness requirements.

1. Theoretical Foundation and Core Axioms

Integrated Gradients is formulated for a differentiable function $F:\mathbb{R}^n \rightarrow \mathbb{R}$ , a target input $x$ , and a reference baseline $x'$ . The canonical IG attribution for feature $i$ is

$\mathrm{IG}_i(x) = (x_i - x'_i)\int_{0}^{1} \frac{\partial F\big(x' + \alpha(x - x')\big)}{\partial x_i}d\alpha.$

Central to IG's design are several axioms, which it uniquely satisfies among a large class of attribution methods (Sundararajan et al., 2017, Lundstrom et al., 2023):

Completeness: $\sum_i \mathrm{IG}_i(x) = F(x) - F(x')$ .
Implementation Invariance: If two models compute the same function, their IG attributions match.
Linearity: For $aF + bG$ , attributions are $a$ \, $\mathrm{IG}[F]+\,$ b $\,$ \mathrm{IG}[G] $.</li> <li><strong>Sensitivity/Dummy</strong>: Features that do not affect the output across the path receive zero attribution.</li> </ul> <p>For analytic or piecewise-analytic functions, Integrated Gradients is characterized as the unique single-path method that satisfies completeness, symmetry, linearity, affine scale invariance, and proportionality among monotone straight-line paths (<a href="/papers/2306.13753" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Lundstrom et al., 2023</a>). This formal grounding distinguishes IG from heuristic or partial approaches such as input$ \times $gradient, DeepLIFT, or Layerwise Relevance Propagation, which may violate key axioms.</p> <h2 class='paper-heading' id='methodology-computation-and-implementation'>2. Methodology, Computation, and Implementation</h2> <p>In practice,$ F $is a deep network for which the path-integral above lacks a closed form. IG is computed by discretizing the interval$ \alpha\in[0,1] $(typically into$ 20 $–$ 300 $steps), evaluating the gradient at each interpolation point, and averaging:$ \mathrm{IG}_i(x) \approx (x_i - x'_i)\cdot \frac{1}{m} \sum_{k=1}^{m} \frac{\partial F(x' + \frac{k}{m}(x - x'))}{\partial x_i}. $Computational cost scales linearly with the number of steps, as each requires a forward and backward pass. The completeness property can be numerically checked to monitor integration and numerical error.</p> <p>Empirical usage requires careful baseline selection, as$ x'$ must correspond to genuine absence of signal (e.g., black image, zero embedding). Inappropriate baselines can introduce bias or spurious attributions, though the method retains formal guarantees relative to any fixed baseline (<a href="/papers/1703.01365" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Sundararajan et al., 2017</a>).</p> <h2 class='paper-heading' id='limitations-and-pathologies-saturation-baseline-sensitivity-and-discrete-domains'>3. Limitations and Pathologies: Saturation, Baseline Sensitivity, and Discrete Domains</h2> <p>Despite its axiomatic guarantees, IG faces practical challenges:</p> <ul> <li><strong>Saturation effect</strong>: For many deep networks, outputs may saturate (i.e. plateau) along the input path before reaching the target; gradients in these regions are near zero, yet IG weights all path segments equally. This can lead to "noisy" or incomplete attributions dominated by uninformative regions (<a href="/papers/2010.12697" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Miglani et al., 2020</a>, <a href="/papers/2305.20052" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Walker et al., 2023</a>).</li> <li><strong>Baseline dependence</strong>: The choice of $x'$ can drastically affect attributed features; standard choices (zero, mean, blurred) are not universally "neutral" (<a href="/papers/2310.04821" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Liu et al., 2023</a>, <a href="/papers/2503.08240" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Simpson et al., 11 Mar 2025</a>).</li> <li><strong>Discrete or off-manifold interpolation</strong>: For data such as word embeddings or graphs, the straight path from $x' $to$ x $may traverse non-data-manifold, out-of-distribution regions. Gradients computed there do not reflect semantically meaningful changes (<a href="/papers/2108.13654" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Sanyal et al., 2021</a>, <a href="/papers/2412.03886" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Roy et al., 2024</a>, <a href="/papers/2509.07648" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Simpson et al., 9 Sep 2025</a>).</li> </ul> <p>Several approaches address these issues, including alternative path constructions, multiple baselines, and discretization strategies. Notably, <a href="https://www.emergentmind.com/topics/adaptive-sampling" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">adaptive sampling</a> and non-uniform Riemann sums can reduce integration error by concentrating steps in information-rich path segments (<a href="/papers/2410.04118" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Swain et al., 2024</a>, <a href="/papers/2305.20052" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Walker et al., 2023</a>).</p> <h2 class='paper-heading' id='variants-and-methodological-extensions'>4. Variants and Methodological Extensions</h2> <p>A diverse array of IG extensions adapt the method to domain-specific requirements, address noise/saturation, or generalize the attribution framework:</p> <ul> <li><strong>Integrated Decision Gradients (IDG)</strong>: Weights each pathwise gradient by the derivative of the output logit, focusing on decision regions and eliminating saturated-region contributions. IDG combines an importance factor$ \frac{\partial F}{\partial \alpha} $with adaptive sampling along the path, empirically yielding sharper, more informative attributions (<a href="/papers/2305.20052" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Walker et al., 2023</a>).</li> <li><strong>Path-Weighted IG (PWIG)</strong>: Generalizes the IG integral by assigning a user-defined weight$ w(\alpha) $to emphasize or de-emphasize specific path regions. PWIG enables focus on early, late, or intermediate path segments, at the expense of completeness except when$ w\equiv1$ (Kamalov et al., 22 Sep 2025).
Guided IG (GIG): Employs adaptive paths that avoid high-noise, off-object regions by greedily moving along low-sensitivity features and optionally anchoring steps to the straight-line path, significantly reducing noise artifacts in vision models (Kapishnikov et al., 2021).
Counterfactual and Shapley-inspired Baselines: Multiple or data-driven baselines (e.g., Shapley IG) reduce baseline-dependence and align IG with Shapley value theory, achieving improved faithfulness to true feature contributions (Liu et al., 2023).
Manifold and Geodesic IG: On Riemannian manifolds (images, embeddings), IG along geodesics (e.g. as in GIG (Salek et al., 17 Feb 2025) or Manifold IG (Zaher et al., 2024)) produces attributions that align with intrinsic data geometry, reducing spurious attributions and increasing robustness to adversarial perturbations.
Graph and Discretized IG: On non-Euclidean domains, IG is adapted to sum over meaningful discrete paths (e.g., all shortest paths in a graph (Simpson et al., 9 Sep 2025)) or snap interpolants to actual vocabulary embeddings (e.g., DIG, UDIG (Sanyal et al., 2021, Roy et al., 2024)) to ensure path points are data-valid.

A summary of selected variants and domain adaptations is given below:

Variant	Key mechanism	Application area
IDG	Importance factor, adaptive grid	Vision
GIG	Data/model-guided path	Vision
PWIG	Path-dependent weights $w(\alpha)$	Medical imaging
DIG/UDIG	Monotonic/nearest-word paths	NLP embedding
Manifold IG	Geodesics in latent/data manifold	Vision, structured
Graph IG	Discrete shortest path sums	GNNs, graphs
SIG	Shapley-weighted baseline avg	General

5. Benchmarking and Empirical Results

Standard IG and its extensions have been evaluated across vision, NLP, medical imaging, and GNNs, using datasets such as ImageNet, SST-2, OASIS-1, and ShapeGGen. Evaluation metrics include:

Insertion/Deletion score (RISE AUC): Measures how attributions correspond to true impact on model output.
Softmax and Accuracy Information Curve AUCs: Evaluate attributions via partial image/text reveals.
Faithfulness (ABPC, LO, Comprehensiveness/Sufficiency): Quantifies the effect of ablating or keeping top- $k$ attributed features (Miglani et al., 2020, Kamalov et al., 22 Sep 2025, Roy et al., 2024).
Sharpness, stability, human alignment: Both qualitative and quantitative (variance under input noise, agreement with human annotators).

IDG demonstrated consistently higher RISE and SIC/AIC insertion AUCs, sharper heatmaps, and lower spurious activations compared to IG, Left-IG, GIG, and Adversarial GI, with up to $5$– $15\,\%$ improvement (Walker et al., 2023). Variants such as GIG, DIG, and Manifold IG have outperformed vanilla IG by similar margins across relevant domain-specific metrics (Kapishnikov et al., 2021, Zaher et al., 2024, Roy et al., 2024).

6. Interpretability, Robustness, and Limitations

Integrated Gradients and its variants provide interpretable decompositions rooted in clear methodology, but their faithfulness depends on path, baseline, and domain alignment. While the original IG method is vulnerable to saturation, path-length cancellation, baseline ambiguity, and non-manifold interpolation, state-of-the-art approaches (IDG, GIG, PWIG, Manifold IG) offer targeted mitigation strategies. However, these often increase computational cost or weaken completeness, and introduce hyperparameter or model structure dependencies. The approximation of the path integral via Riemann sums remains a source of bias unless sample placement is adaptively optimized (Swain et al., 2024).

A plausible implication is that "hybrid" schemes—combining model-based path adaptations, manifold-aware integration, and Shapley-style baseline ensembles—will be essential for trustworthy attributions in high-dimensional, multimodal, or safety-critical settings.

7. Future Directions

Active research targets several open directions:

Learning manifold-conforming or data-aware paths via generative models, adaptive weighting, or variational optimization (Zaher et al., 2024, Salek et al., 17 Feb 2025).
Algorithmic complexity reduction through optimized Riemann sum scheduling, batchwise integration, and instance-conditional path choice (Swain et al., 2024).
Extending completeness and axiomatic properties to non-uniform and non-monotonic path methods.
Cross-domain and structured-data adaptations for GNNs, time series, and multimodal data (Simpson et al., 9 Sep 2025).
Benchmarking via causal and human-centered metrics, to calibrate attribution faithfulness in the absence of ground truth.

Integrated Gradients remains the canonical reference point for axiomatic, path-based attributions in XAI, with its methodological descendants advancing both theoretical and practical rigor across application areas (Sundararajan et al., 2017, Lundstrom et al., 2023, Walker et al., 2023).