Integrated Gradient Attribution

Updated 7 February 2026

Integrated Gradient Attribution is a path-based feature attribution method that computes feature importance by integrating gradients from a baseline to the actual input.
It satisfies key axioms such as sensitivity, completeness, and symmetry, ensuring that attributions are consistent and mathematically grounded.
Practical implementations involve discretization, careful baseline selection, and advanced variants to address challenges like saturation and off-manifold integration.

Integrated Gradient Attribution (IG) is a foundational path-based feature attribution method for deep networks, designed to assign a numerically quantifiable importance to each input feature in supporting a model’s prediction. By integrating gradients along a path from a baseline input (representing absence of signal) to the observed input, IG reveals which features drive model behavior and satisfies a suite of formal axioms that distinguish it from earlier approaches. IG has spawned multiple theoretical extensions, practical variants to address key limitations, and has seen wide adoption in high-stakes domains across vision, language, medical, and scientific applications.

1. Formal Definition and Theoretical Foundations

Let $F:\mathbb{R}^n\to\mathbb{R}$ be a (potentially deep) model and $x\in\mathbb{R}^n$ an input. The practitioner selects a baseline input $x'$ —often the all-zero vector or another contextually "uninformative" value. The IG attribution for feature $i$ is defined as:

$\mathrm{IG}_i(x) = (x_i - x'_i)\int_{0}^{1} \frac{\partial F\bigl(x' + \alpha (x - x')\bigr)}{\partial x_i} \, d\alpha$

This formula computes, for each feature, the path-integral of the model's partial derivative with respect to that feature as input moves from baseline $x'$ to input $x$ along the straight line $\gamma(\alpha) = x' + \alpha(x - x')$ for $\alpha\in[0,1]$ (Sundararajan et al., 2017). The output is an attribution vector $\mathrm{IG}(x) = (a_1, \ldots, a_n)$ such that the sum $\sum_i a_i$ closely matches $F(x)-F(x')$ .

IG is typically approximated discretely with $m$ uniformly spaced steps:

$\mathrm{IG}_i(x) \approx (x_i-x'_i)\cdot \frac{1}{m}\sum_{k=1}^m \left.\frac{\partial F(z)}{\partial x_i}\right|_{z = x' + (k/m)(x-x')}$

Automatic differentiation frameworks enable efficient computation for arbitrary models. Generally, $m$ in the range 20–300 offers a balance between accuracy and compute cost (Sundararajan et al., 2017).

Axiomatic Guarantees

IG was motivated by two formally stated axioms:

Sensitivity: If $x$ and $x'$ differ only in $x_i$ and $F(x)\neq F(x')$ , then $\mathrm{IG}_i(x)\neq 0$ .
Implementation Invariance: Functionally equivalent networks yield identical attributions.

Further analysis has established that IG also satisfies:

Completeness: $\sum_i \mathrm{IG}_i(x) = F(x) - F(x')$ (fundamental theorem of calculus for path integrals).
Symmetry Preservation: If features $i$ and $j$ are symmetric for $F$ , and $x_i=x_j, x'_i=x'_j$ , then $\mathrm{IG}_i(x)=\mathrm{IG}_j(x)$ (Lerma et al., 2021, Lundstrom et al., 2023).

Extensions have axiomatized IG as the unique path-based method satisfying various bundles of completeness, linearity, dummy, symmetry, affine scalability, proportionality, and non-decreasing positivity (NDP) (Lundstrom et al., 2023, Lundstrom et al., 2022).

2. Baseline Selection and Practical Workflow

Key to IG's interpretability is the choice of baseline $x'$ , representing the reference of “no signal.” In image models, this is often the black image (all zeros); for text, the zero embedding or a [MASK]/[PAD] embedding; for molecular graphs, a null atom or an empty graph (Sundararajan et al., 2017). Practitioners should confirm that $F(x')$ is near zero, ensuring attributions sum closely to $F(x)$ . The baseline significantly affects the interpretation, and different variants (e.g., BlurIG, Manifold IG) seek to remove or mitigate baseline dependence (Xu et al., 2020, Zaher et al., 2024).

Riemann sum approximation with $m$ steps yields the attributions; the convergence of $\sum_i \mathrm{IG}_i(x)$ to $F(x)-F(x')$ serves as a diagnostic for sufficient discretization. Attribution quality should be assessed visually (e.g., heatmaps), quantitatively (e.g., AUC for relevance maps), or through downstream human-in-the-loop and perturbation analyses (Sundararajan et al., 2017, Miglani et al., 2020).

3. Limitations and Algorithmic Variants

Saturation and Faithfulness

Regions of the path with flat $F(\gamma(\alpha))$ —saturated regions—often contribute disproportionately to the final IG, even if the output hardly changes (Miglani et al., 2020). This effect degrades faithfulness, as most of the IG vector may come from such regions. Splitting the integral to focus on unsaturated regions (“Left-IG”) produces more faithful and stable attributions. Adaptive weighting (IDG) or non-uniform Riemann sampling can further prioritize path segments where the model’s decision rapidly transitions (Walker et al., 2023, Miglani et al., 2020).

Off-Manifold Integration

Linear paths between baseline and input may leave the data manifold, resulting in attributions on input directions never encountered during training. Manifold-based extensions compute IG along geodesics on a learned latent manifold (MIG, GIG), or adapt paths using gradient geometry, thereby conforming more closely to data distributions and avoiding off-manifold artifacts (Zaher et al., 2024, Salek et al., 17 Feb 2025, Bordt et al., 2022).

Noise and Spurious Attributions

Saliency maps from standard IG tend to be noisy, with significant attribution mass on irrelevant or background features, particularly in vision models. Adaptive Path Methods, such as Guided IG, select paths that avoid regions of high, uninformative gradient magnitude, substantially reducing background noise and improving alignment with model decisions (Kapishnikov et al., 2021). SmoothTaylor and SmoothGrad apply noise-averaging strategies to reduce pixel-level granularity (Goh et al., 2020).

Baseline and Discrete Feature Spaces

For discrete input spaces (e.g., LLMs), linear interpolation traverses non-existent or semantically meaningless intermediate points. Uniform Discretized Integrated Gradients (UDIG) and similar approaches snap each step to actual word embeddings, ensuring gradients are computed at semantically valid points (Roy et al., 2024).

Advanced Extensions

Geodesic IG (GIG): Integrates on the path of least resistance under a network-induced Riemannian metric, satisfying a “strong completeness” axiom, and avoiding high-gradient, off-manifold shortcuts (Salek et al., 17 Feb 2025).
Amplitude-based IG for Quantum Models: Employs quantum-native gradient estimation (Hadamard tests) for circuits using amplitude encoding (DiBrita et al., 2 Oct 2025).
Pattern-Guided IG: Combines data-driven noise suppression with path integration, enhancing robustness to distractor features (Schwarzenberg et al., 2020).
Dataset-Level Attribution: IG Correlation (IGC) summarizes feature attributions across an entire dataset, revealing model strategies beyond single-input inspection (Lelièvre et al., 2024).

4. Empirical Results and Applications

IG is widely validated on object recognition (ImageNet, GoogleNet, Inception, ResNet), diabetic retinopathy, question answering, and molecular design. In vision, IG identifies regions (pixels, patches) on objects driving the target class prediction; in medical imaging, it pinpoints fine lesion or anomaly boundaries; in text models, it isolates trigger words or spans; in chemistry, it localizes atom or bond contributions to predicted activity (Sundararajan et al., 2017).

Summarized implementation and performance findings:

Variant	Problem Addressed	Notable Empirical Finding
IG	Baseline path integral	Saliency matches human and model logic on images, text, molecules
Split IG	Path saturation	Improves faithfulness, stability; less sensitive to noisy gradients
Guided IG	Off-manifold noise	Sharper, background-suppressed maps; higher localization AUC
GIG/MIG	Data manifold	Lower infidelity, higher robustness to adversarial attribution attacks
PGIG	Noise suppression	Avoids saturation, filters distractors, wins image-degradation tests
UDIG	Discrete embeddings	Stronger faithfulness for LLM attributions (SST-2, IMDb, QA tasks)

Completeness and alignment diagnostics, information curve metrics (AIC, SIC), pointing game/localization AUCs, and perturbation-based evaluation (insertion/deletion, ABPC) are commonly used for quantitative benchmarking (Miglani et al., 2020, Walker et al., 2023, Kapishnikov et al., 2021).

5. Mathematical Properties and Axiomatic Uniqueness

IG satisfies completeness, linearity, dummy, implementation invariance, proportionality, affine scale invariance, and symmetry-preserving axioms (Lundstrom et al., 2023, Lundstrom et al., 2022). Characterizations show that—given these axioms (with non-decreasing positivity where necessary)—IG is the unique single-path attribution scheme, up to the choice of straight-line between baseline and input (Lundstrom et al., 2023).

However, the uniqueness claim holds most strongly when the domain is restricted to monotonic, real-analytic functions and when additional conditions (e.g., non-decreasing positivity) are imposed. Otherwise, as shown by Lerma & Lucas (Lerma et al., 2021), there may exist other monotonic path methods with symmetry properties, though the straight-line is canonical by parsimony. Importantly, strong completeness (match between $\sum_i|\mathrm{IG}_i|$ and $|F(x)-F(x')|$ ) is only exactly satisfied by geodesic-path IG under the appropriate Riemannian metric (Salek et al., 17 Feb 2025).

6. Visualization, Use Cases, and Limitations

Visualization tools convert IG attributions into intensity maps (vision), color-coded tokens (text), or heat-mapped nodes/edges (graphs). In practice:

Sharp, well-localized IG heatmaps correlate strongly with model confidence.
Misattribution—e.g., highlighting unrelated regions—may indicate dataset biases or model shortcuts.
Averaged (dataset-level) IG can reveal global strategies or facilitate feature selection and pruning (Sundararajan et al., 2017, Lelièvre et al., 2024).

Key limitations include:

Path selection and baseline choice: highly sensitive to domain, often requiring domain-specific adaptation (e.g., BlurIG, manifold paths).
Computational cost: high-step variants can be demanding in large models or with complex paths (e.g., geodesics, adaptive paths).
Off-manifold attributions: present in standard IG and require data-driven paths (MIG/GIG) or smoothing/adaptive methods for mitigation.
Lack of strict causal interpretation: IG quantifies association with the model's prediction, not intervention-based causal effect.

Emerging best practices combine IG with manifold learning, adaptive path methods, or noise reduction (e.g., SmoothTaylor, SmoothGrad) to sharpen attributions and more closely align explanations with domain semantics and user needs.

7. Conclusion and Ongoing Developments

Integrated Gradients and its variants constitute a mature, theoretically grounded framework for feature attribution in deep networks. Driven by a precise axiomatic basis and practical implementation efficiency, IG serves as a cornerstone in interpretable machine learning research and application. Extensions continue to address data-modality-specific path selection, saturation, noise, manifold-conformance, and baseline challenges. The trend is toward hybridization—combining multiple sources of signal (counterfactuals, patterns, manifolds, gradients)—to produce attributions with higher fidelity, robustness, and explanatory power in increasingly complex, high-dimensional, and multimodal systems (Sundararajan et al., 2017, Lundstrom et al., 2023, Zaher et al., 2024, Salek et al., 17 Feb 2025).

Markdown Upgrade to Chat

References (16)

Axiomatic Attribution for Deep Networks (2017)

Symmetry-Preserving Paths in Integrated Gradients (2021)

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method (2023)

A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions (2022)

Attribution in Scale and Space (2020)

Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution (2024)

Investigating Saturation Effects in Integrated Gradients (2020)

Integrated Decision Gradients: Compute Your Attributions Where the Model Makes Its Decision (2023)

Using the Path of Least Resistance to Explain Deep Networks (2025)

10.

The Manifold Hypothesis for Gradient-Based Explanations (2022)

11.

Guided Integrated Gradients: An Adaptive Path Method for Removing Noise (2021)

12.

Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution (2020)

13.

Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models (2024)

14.

Amplitude-based Input Attribution in Quantum Learning via Integrated Gradients (2025)

15.

Pattern-Guided Integrated Gradients (2020)

16.

Integrated Gradient Correlation: a Dataset-wise Attribution Method (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Integrated Gradient Attribution.