Papers
Topics
Authors
Recent
Search
2000 character limit reached

Uniform Discretized Integrated Gradients

Updated 16 February 2026
  • UDIG is a gradient-based feature attribution method that leverages uniform, monotonic, and discretized interpolation to enhance LLM explanation quality.
  • By snapping interpolation points to valid word embeddings, UDIG ensures robust, in-vocabulary gradient approximations and reduces semantic drift.
  • Empirical evaluations on sentiment and QA tasks show UDIG improves attribution fidelity and reduces completeness error compared to IG and DIG.

Uniform Discretized Integrated Gradients (UDIG) is a gradient-based feature attribution method developed to address the limitations of path-based explainability approaches for LLMs when operating in the inherently discrete space of word embeddings. UDIG refines the standard Integrated Gradients (IG) framework by introducing a uniform, monotonic, and discretized interpolation methodology that retains theoretical guarantees while enhancing attribution faithfulness for text and LLM prediction tasks (Roy et al., 2024, Sanyal et al., 2021).

1. Foundations in Integrated Gradients and Discretization

Integrated Gradients (IG) measures feature attributions by evaluating the path integral of the gradient of a model’s output FF with respect to its input embeddings xRm×nx\in\mathbb{R}^{m\times n}, interpolated linearly from a baseline xx' to xx. Letting x+α(xx)x'+\alpha(x-x') denote the interpolation, IG is formally defined: IGi,j(x)=(xi,jxi,j)01F(x+α(xx))xi,jdα\mathrm{IG}_{i,j}(x) = (x_{i,j}-x'_{i,j}) \int_{0}^{1}\frac{\partial F(x' + \alpha (x-x'))}{\partial x_{i,j}}\,d\alpha with per-token attribution IGi(x)=j=1nIGi,j(x)\mathrm{IG}_i(x) = \left\| \sum_{j=1}^n \mathrm{IG}_{i,j}(x)\right\|. In practice, this integral is approximated with a Riemann sum over KK equally-spaced points.

Two desirable axioms (Sensitivity and Implementation Invariance) and the Completeness property (iIGi(x)=F(x)F(x)\sum_i \mathrm{IG}_i(x) = F(x)-F(x')) are satisfied by IG, rendering it attractive for explainability. However, the straight-line path constructed in embedding space rarely lies on the word manifold; it typically passes through out-of-distribution points, resulting in poorly-calibrated gradients and diminished explanation quality.

The Discretized Integrated Gradients (DIG) extension addresses this by mapping interpolated points back to word embeddings using nearest-neighbor searches, but the path induced by DIG may be non-uniform, non-monotonic, and subject to drift, increasing approximation error and violating monotonicity (Sanyal et al., 2021).

2. The Uniform Discretized Integrated Gradients Methodology

UDIG introduces a uniform discretization strategy that forges a middle ground between the theoretical rigor of IG and the semantic fidelity of DIG. The core procedure is as follows:

  • Uniform Interpolation: Generate KK raw interpolation points along the straight line from baseline wRnw' \in \mathbb{R}^n to the token embedding ww:

z~k=w+kK(ww)for k=0,,K\tilde z_k = w' + \frac{k}{K}(w - w')\quad \text{for } k=0,\ldots,K

  • Discretization via Monotonic, Nearest-Neighbor Anchoring: Each z~k\tilde z_k is snapped to the closest vocabulary embedding zkz_k using a kk-nearest-neighbor search. The path constructed must be monotonic—each anchor must lie “between” ww' and ww—and is selected using either a Greedy or Max-Count monotonicity enforcement.
  • Riemann Approximation and Attribution: The UDIG attribution for input xx and baseline xx' is:

UDIGi,j(x)k=1K(zk,jzk1,j)Fzk,jzk\mathrm{UDIG}_{i,j}(x) \approx \sum_{k=1}^{K} (z_{k,j} - z_{k-1,j}) \cdot \left. \frac{\partial F}{\partial z_{k,j}} \right|_{z_k}

The per-token score is UDIGi(x)=j=1nUDIGi,j(x)\mathrm{UDIG}_i(x) = \| \sum_{j=1}^n \mathrm{UDIG}_{i,j}(x) \|.

By construction, all anchors are in the vocabulary, ensuring “in-distribution” gradients, and the path is regularized to be both uniform and monotonic in embedding space (Roy et al., 2024, Sanyal et al., 2021).

3. Algorithmic Specification and Implementation Considerations

The UDIG algorithm proceeds in three primary stages:

  1. Interpolation: Compute KK evenly spaced interpolation points between ww' and ww.
  2. Discretization: Snap each interpolation point to the nearest monotonic word embedding using approximate nearest-neighbor search (e.g., FAISS or Annoy), constrained by top-LL candidates (L50L\sim50).
  3. Attribution Computation: For each segment, accumulate (zkzk1)zF(zk)(z_k - z_{k-1}) \odot \nabla_z F(z_k), with the final token score as the vector norm.

The computational complexity of UDIG is approximately 6×6\times that of IG due to additional nearest-neighbor lookups and monotonic adjustment procedures. The baseline embedding can be “MASK,” “PAD,” or zero-vector; empirical evidence suggests “MASK” provides neutral outputs and preferable completeness properties. Typically, K=30K=30–50 yields a balance between completeness error and runtime (Roy et al., 2024, Sanyal et al., 2021).

4. Empirical Evaluation and Metrics

Evaluation of UDIG employs both sentiment classification (SST-2, IMDb, Rotten Tomatoes) and question answering (SQuAD) tasks, using BERT, DistilBERT, and fine-tuned BERT QA models. Metrics applied include:

  • Log Odds Change (LO): Measure of predicted class log-odds decrement after masking top 20% (sentiment) or 50% (QA) attribution tokens; more negative is better.
  • Comprehensiveness (Comp): Output drop after removing top-attribution tokens; higher is better.
  • Sufficiency (Suff): Output difference when retaining only top-attribution tokens; lower is better.
  • Delta % Error: Median completeness error—difference from the expected sum of attributions to model output change.

Performance table for sentiment tasks (SST-2, DistilBERT shown):

Method LO↓ Comp↑ Suff↓
IG –0.950 0.248 0.275
DIG –1.229 0.301 0.238
UDIG –1.653 0.389 0.165

On SQuAD (BERT QA, start logits):

Metric IG DIG UDIG
Suff↓ 0.472 0.523 0.446
Comp↑ 0.732 0.724 0.742

Delta % Error was consistently lower for UDIG across datasets (e.g., SST-2 DistilBERT: DIG 33.04%, UDIG 25.22%). With matched runtime, UDIG outperformed both IG and DIG on all main metrics for the same time budget (Roy et al., 2024).

UDIG further reduces the word-approximation error (WAE) to zero, since all interpolation points are snapped to valid token embeddings (Sanyal et al., 2021).

5. Theoretical Properties and Limitations

UDIG retains the axiomatic guarantees of IG—Sensitivity, Implementation Invariance, and Completeness—due to its path-construction and summation method. The uniform path and in-vocabulary discretization reduce integral-approximation and discretization drift error compared to non-monotonic/dispersed paths in DIG.

The principal limitations of UDIG are computational: nearest-neighbor search and monotonic adjustment increase runtime and resource requirements approximately six-fold over standard IG. UDIG’s fidelity depends on vocabulary size, coverage, and quality of the embedding space. The method is dependent on approximate nearest neighbor infrastructure (FAISS, Annoy), especially for large vocabularies (Roy et al., 2024).

6. Comparative Context and Extensions

Relative to IG, UDIG regularizes the integration path and ensures all gradient evaluations lie on the manifold of actual word embeddings, thus producing more faithful and robust attributions in the context of LLMs. Compared to the data-driven non-linear paths of DIG, UDIG slightly trails in human evaluation rankings but matches or exceeds in automatic metrics, with the additional benefit of uniformity and zero approximation error due to explicit path construction (Sanyal et al., 2021, Roy et al., 2024).

Potential extensions include custom path construction (e.g., geodesic clustering, dynamic step scheduling), adapting the method for modalities beyond language (e.g., code tokens, multilingual embeddings), and deploying in generative LLMs without native MASK tokens, potentially substituting with PAD or zero vectors (Roy et al., 2024).

7. Key Insights and Practical Recommendations

The critical insight of UDIG is that uniform sampling along the interpolation line, combined with monotonic projection onto the vocabulary, yields highly interpretable, accurate, and theoretically grounded attributions for models operating on discrete text representations. Empirical results show that UDIG consistently outperforms IG and DIG across multiple tasks and metrics, particularly when using the “MASK” embedding as a neutral baseline. For deployment at scale, pre-built FAISS indices are recommended for efficient nearest-neighbor search.

For researchers seeking implementation, the public DIG repository offers a foundation; UDIG can be realized by modifying the interpolation schema to use uniform linear fractions and nearest-neighbor snapping in existing attribution pipelines (Sanyal et al., 2021).

In conclusion, UDIG provides a principled, performant, and transparent attribution framework for explaining LLM decisions in NLP applications, combining the axiomatic strengths of IG with discretization methods tailored for word manifold structures (Roy et al., 2024, Sanyal et al., 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Uniform Discretized Integrated Gradients (UDIG).