Papers
Topics
Authors
Recent
2000 character limit reached

Integrated Gradients for Model Explanations

Updated 1 January 2026
  • Integrated Gradients is a feature attribution method that integrates local gradients along the straight-line path from a baseline to the input, ensuring axiomatic properties like completeness and sensitivity.
  • Recent advances optimize the integration process using non-uniform grids and adaptive path strategies to reduce computational overhead and discretization noise.
  • Practical applications span vision, NLP, and graphs, with enhancements like Guided IG and Path-Weighted IG improving attribution fidelity and robustness in diverse domains.

Integrated Gradients (IG) is a rigorously axiomatized feature attribution method for explaining predictions of differentiable models, especially deep neural networks. IG attributes to each input feature the portion of the model output that accrues along the path from a chosen baseline input to the actual input, summing local gradients in feature space. Its mathematical, algorithmic, and axiomatic properties, together with a variety of enhancements addressing computational and statistical fidelity, have established IG as a central method in Explainable AI (XAI).

1. Mathematical Formulation and Axiomatic Foundations

IG assigns to each input feature ii an attribution based on the integral of the gradient of the model output along the straight-line path from a reference baseline xRdx' \in \mathbb{R}^d to the input xRdx \in \mathbb{R}^d:

IGi(x;x,F)=(xixi)01F(x+α(xx))xidα\mathrm{IG}_i(x; x', F) = (x_i - x'_i)\int_{0}^{1} \frac{\partial F(x' + \alpha(x-x'))}{\partial x_i} \, d\alpha

In discrete form, using mm evenly spaced steps αk=k/m\alpha_k = k/m over the interval [0,1][0,1]:

IGi(x;x,F)(xixi)1mk=1mF(x+km(xx))xi\mathrm{IG}_i(x; x', F) \approx (x_i - x'_i)\frac{1}{m} \sum_{k=1}^{m} \frac{\partial F(x' + \frac{k}{m}(x-x'))}{\partial x_i}

IG satisfies several key axioms under mild regularity conditions:

  • Implementation Invariance: Attributions depend only on the input–output functional mapping.
  • Sensitivity: Features with no effect on the output receive zero attribution.
  • Completeness: iIGi(x;x,F)=F(x)F(x)\sum_i \mathrm{IG}_i(x; x', F) = F(x) - F(x').
  • Linearity: Attributions decompose linearly for ensembles of models.
  • Symmetry-Preservation: Attributions are equal for features that are symmetric in both the function and inputs.

Advanced characterization theorems show IG is the unique path-based attribution method that is symmetry-preserving, scale-invariant, affinely invariant, and proportionally splits credit among features for monomial functions, provided one restricts to straight-line paths and, in the most general function classes, adds a non-decreasing positivity axiom (Lundstrom et al., 2023, Lerma et al., 2021, Lundstrom et al., 2022).

2. Discrete Approximation and Optimized Integration Grids

Since the IG path integral is usually intractable, numerical integration via Riemann sums is standard. Uniformly spaced steps are commonly used, but two key concerns arise:

  • Computational Overhead: Typical implementations require m200m \sim 20010001\,000 forward and backward passes.
  • Discretization Noise: Uniform grids waste steps in flat regions, contributing minimal value to the integral while increasing noise, and may converge slowly in regions where the model output rapidly changes (Swain et al., 2024).

Recent methods optimize the placement of interpolation points to accelerate convergence:

Method Core Idea Sample Allocation Speedup
Uniform IG Riemann sum with equal α-step intervals mm uniform steps Baseline
Non-uniform IG (Bhat et al., 2023) Allocate more steps to intervals with large output change mjΔfjm_j \propto \sqrt{|\Delta f_j|} across nintn_{\mathrm{int}} intervals $2.6$–3.6×3.6\times
RiemannOpt (Swain et al., 2024) Optimize Riemann grid by minimizing error upper bound Solve min{αj}jg(αj)(Δαj)2\min_{\{\alpha_j\}} \sum_j |g'(\alpha_j)|(\Delta\alpha_j)^2 $2$–4×4\times

These approaches permit empirical speedups of $2.6$–3.6×3.6\times without loss of attribution accuracy, with negligible pre-processing cost (≤3.2%3.2\% overhead) (Bhat et al., 2023). Error bounds depend on the smoothness of the gradient along the path; adaptive methods concentrate steps where F/α|\partial F/\partial\alpha| or its derivative is large, thereby reducing discretization error (Swain et al., 2024).

3. Path Selection, Model-Specific and Data-Manifold Paths

The standard IG uses a fixed, straight-line path from xx' to xx. However, several phenomena motivate data- or model-adaptive path selection:

  • Off-manifold and regionally flat paths: Straight-line paths may traverse irrelevant or low-density regions, accumulating spurious gradients, especially in vision models with hard data manifolds or in NLP models with inherently discrete input spaces (Sanyal et al., 2021, Zaher et al., 2024, Salek et al., 17 Feb 2025).
  • Noise accumulation: Uniform paths may accumulate noise where the gradient norm is high but orthogonal to the decision boundary, yielding attributions that highlight distractor or background regions (Kapishnikov et al., 2021, Yang et al., 2023).

Proposed remedies include:

  • Guided IG: Selects adaptive, model-dependent paths that move features with minimal sensitivity, thereby avoiding noisy or off-manifold regions, and preserving IG completeness and symmetry properties (Kapishnikov et al., 2021).
  • Geodesic Paths/GIG/MIG: Integrate along geodesics with respect to a Riemannian metric induced either by the model's Jacobian (Salek et al., 17 Feb 2025) or by a data manifold learned by generative models (Zaher et al., 2024). These methods minimize the total "energy" or path-length with respect to the model's geometry, yielding attributions that better respect the statistical support of the input distribution and, in some settings, satisfying "strong completeness" (sum of absolute attributions equals the output difference).
  • Discretized IG for NLP: In discrete spaces, such as word embeddings, non-linear or monotone paths traversing valid words/anchors improve faithfulness by reducing out-of-distribution errors relative to naive linear interpolation (Sanyal et al., 2021).

4. Baseline Selection and Shapley-Value Connections

Choice of baseline xx' is critical for producing unbiased and interpretable attributions:

  • Single baseline: Commonly, black images or all-zero vectors are used, but these may lie far from the data manifold and inject artifacts.
  • Multiple baselines and Shapley IG: By connecting IG to the Aumann–Shapley value from cooperative game theory, a distribution over baselines (sampling from coalitions) better approximates Shapley values and yields more robust and semantically focused explanations. The Shapley Integrated Gradients (SIG) method samples baselines according to coalition size and averages the resulting IG attributions (Liu et al., 2023).

Completeness holds in expectation over the baseline distribution.

5. Noise Mitigation and Enhanced Attribution Fidelity

Several recent extensions address both the statistical and visual fidelity of IG:

  • Important-Direction Integration (IDGI): Allocates more integration effort to path segments where directional sensitivity is highest, reducing discretization noise and empirically improving metrics such as insertion/deletion AUC and infidelity (Yang et al., 2023).
  • Pattern-Guided IG (PGIG): Multiplies input gradients by feature-wise pattern weights encoding input–target covariances (from PatternAttribution), suppressing noise dimensions while retaining path-based sensitivity (Schwarzenberg et al., 2020).
  • Path-Weighted IG (PWIG): Incorporates a nonnegative weight function along the interpolation parameter to selectively emphasize critical portions of the path, which can enhance sparsity, stability, or focus, at the cost of strict completeness unless normalization is carefully maintained (Kamalov et al., 22 Sep 2025).

6. Specialized and Domain-Adapted Applications

IG has been adapted for a wide spectrum of tasks and domains:

  • Vision: Low-latency hardware-aware IG with non-uniform interpolation for real-time explanations (Bhat et al., 2023). Attributions in image classification and medical imaging (e.g., MRI diagnosis) (Kamalov et al., 22 Sep 2025).
  • Natural Language Processing: Word-level and phrase-level attributions for sentiment, agency, and other sociopsychological markers, with task-driven postprocessing for interpretability (Aghababaei et al., 6 Mar 2025). Overfitting small-label classifiers to bootstrap dictionaries of marker-vocabulary.
  • Graphs: Graph-based IG (GB-IG) averages discrete pathwise gradients over shortest-paths in node-feature space, circumventing the absence of a unique straight-line interpolation on discrete structures (Simpson et al., 9 Sep 2025).
  • Photonic Inverse Design: Pixel-wise IG attributions for CNN surrogates yield physically meaningful design insights in nanophotonics (Park et al., 25 Oct 2025).
  • Knowledge Distillation: Precomputed IG maps from teacher models serve as an effective data-augmentation for model compression, improving student–teacher alignment and empirical test accuracy (Hernandez et al., 17 Jun 2025).
  • Adversarial Robustness: IG and its variants are deployed to generate or defend against transferable adversarial examples. Path diversity, monotonicity, and the use of multiple baselines, as in MuMoDIG, sharply enhance transferability in attacks (Ren et al., 2024).

7. Practical Recommendations and Limitations

  • Step count mm: Use adaptive or optimized step allocation (e.g., non-uniform, RiemannOpt) to attain high-fidelity attributions with reduced computational burden (Bhat et al., 2023, Swain et al., 2024).
  • Path choice: In vision and text settings, model- or data-aware paths yield more reliable attributions, especially when data distribution or functional geometry is highly non-linear (Salek et al., 17 Feb 2025, Zaher et al., 2024, Kapishnikov et al., 2021).
  • Baseline selection: Sophisticated baseline strategies (e.g., sampling from meaningful coalitions or the data manifold) align attributions with human expectations and reduce bias (Liu et al., 2023).
  • Noise handling: Employ direction-aware, pattern-guided, or path-weighted extensions to mitigate discretization or gradient noise (Schwarzenberg et al., 2020, Yang et al., 2023, Kamalov et al., 22 Sep 2025).
  • Domain adaptation: For graph data, discrete path-based averaging is necessary; for NLP, maintain on-manifold paths to legitimate word embeddings (Sanyal et al., 2021, Simpson et al., 9 Sep 2025).
  • Limitations: IG attributions' faithfulness is contingent on path, baseline, and step-count choices. Attribution completeness can be affected in weighted or non-canonical variants unless normalization is enforced. Noise reduction and path adaptation introduce hyperparameters (e.g., number or proportion of directions, weight function w(α)w(\alpha)) requiring tuning.

References

IG and its extensions constitute a mature, theoretically anchored approach to model attribution. Advanced path, baseline, and integration grid adaptations are essential to deploy IG as a robust XAI methodology in diverse and demanding application settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Integrated Gradients (IG).