EAP-IG: Adaptive Integrated Gradients
- EAP-IG is a generalization of integrated gradients that uses adaptive, non-uniform weighting along the integration path to capture local data geometry.
- It preserves the core axioms of standard IG while enabling flexible, data-driven path sampling methods like weighted RiemannOpt and tangential alignment.
- EAP-IG improves model interpretability in tasks such as manifold alignment and circuit discovery, though it introduces extra hyperparameters and computational overhead.
Integrated Gradients (EAP-IG) generalizes the classical integrated gradients (IG) attribution framework by replacing the uniform integration (Riemann) measure along the straight-line path with a non-uniform or adaptive weighting. This construct expands the expressive capacity of path-based attributions, allowing more flexible and potentially data- or geometry-adaptive integration schemes. EAP-IG (alternatively, "Expected Adaptive Path–IG," Editor's term) maintains the axiomatic core of IG under mild conditions, but diverges from standard uniqueness characterizations. Several recent directions—including its application to manifold alignment (Simpson et al., 11 Mar 2025), faithfulness in circuit discovery (Hanna et al., 2024, Méloux et al., 1 Oct 2025), and weighted Riemann Opt approaches (Swain et al., 2024, Lundstrom et al., 2023)—explore the theoretical and practical impact of EAP-IG.
1. Mathematical Definition of EAP-IG
Consider a differentiable function , a sample to be explained, and a baseline . Standard IG is defined by
EAP-IG generalizes this by introducing a non-uniform weight function with , yielding
where encodes the preference for sampling at particular locations along the path.
2. Axiomatic Properties and Uniqueness
EAP-IG preserves the completeness, linearity, implemented invariance, non-decreasing positivity, symmetry, and affine scale invariance axioms inherited from classical IG, so long as and (Lundstrom et al., 2023). Explicitly:
- Completeness: 0, since
1
for any straight-line path 2.
- Positivity: If 3 is non-decreasing along the path, positivity holds coordinatewise.
- Symmetry-Preserving: If 4 is invariant under swapping coordinates and 5 respect this, then 6 for 7.
However, uniqueness results that single out the uniform weight (8) depend on an additional reparameterization-invariance axiom. Any non-constant 9 (yielding EAP-IG) always preserves the classical componentwise axioms, but violates reparameterization invariance (Lundstrom et al., 2023).
3. Algorithmic Variants and Optimization of Integration Paths
The EAP-IG variant enables explicit optimization over the path integral weighting or support. This is operationalized via:
- Weighted Riemann sampling: One chooses 0 not uniformly, but to minimize integral discretization error via a data-driven criterion, as in RiemannOpt (Swain et al., 2024). The optimal breakpoints can be found by minimizing
1
where 2 estimates the absolute derivative of the integrand at 3.
- Tangentially Aligned Integrated Gradients: The baseline 4 can be optimized to maximize tangential alignment relative to the data manifold, yielding attributions lying in the manifold tangent space (Simpson et al., 11 Mar 2025).
- Adaptive path selection in circuit mechanisms: In EAP-IG for circuit interpretability (Hanna et al., 2024, Méloux et al., 1 Oct 2025), weights or sampling locations are tailored to reflect intervention relevance or causal saliency.
| EAP-IG instantiation | Weighting scheme / path | Primary function |
|---|---|---|
| Uniform (standard IG) | 5 | Canonical path |
| Data-driven RiemannOpt | 6 via optimization of integrand var | Noise/error reduction |
| Tangential alignment | 7 induced by manifold geometry | Human-aligned support |
| Mechanism/circuit focus | 8 reflects intervention range | Causal faithfulness |
4. Faithfulness, Variance, and Interpretability in Mechanistic Discovery
EAP-IG models are central to recent progress in finding faithful circuit representations in large neural transformers:
- Faithfulness is defined as the preservation of task-specific performance after ablating all edges outside the discovered subgraph (Hanna et al., 2024). EAP-IG yields circuits significantly more faithful (i.e., closer to clean model behavior) than vanilla EAP methods, especially at small circuit sizes. This is due to EAP-IG's avoidance of zero-gradient pathologies.
- Recent work has framed EAP-IG circuits as statistical estimators, assessing structural and performance variance under multiple perturbations (Méloux et al., 1 Oct 2025). High variance and hyperparameter sensitivity, such as to the number of interpolation steps or intervention schemes, have been empirically demonstrated, necessitating routine reporting of stability metrics.
Key stability metrics include:
- Circuit error (mean classification divergence)
- Jaccard index (structural edge overlap variance)
- Response under prompt paraphrasing, data resampling, and controlled random ablation
Best practices now recommend:
- Reporting mean/variance of faithfulness and structure under resampling
- Explicit justification and sensitivity sweeps of EAP-IG settings (aggregation, intervention choice)
- Noise injection stress-testing to reveal instability modes
5. EAP-IG in Manifold-Constrained and Tangent-Space-Optimized Attribution
EAP-IG supports geometric regularization of the baseline and the integration path:
- In Tangentially Aligned Integrated Gradients (TA-IG), the baseline 9 is selected so that the resultant attribution vector is maximally aligned to the tangent space 0 of an empirical data manifold 1 (Simpson et al., 11 Mar 2025).
- The tangential-alignment score 2 formalizes this principle: the optimal baseline solver seeks 3.
- Empirically, TA-IG yields attributions much more concentrated in perceptually meaningful, manifold-supported directions than any standard baseline across several image datasets (e.g., 4 vs. 5 for common baselines).
6. Limitations and Open Issues
EAP-IG introduces new classes of hyperparameters and potential sources of instability:
- The optimality of a weight 6 is context- and task-dependent. Faithfulness or interpretability gains may be offset by sensitivity to the choice of 7, the method for baseline selection, or the geometry of the manifold encoder.
- Convergence of implicit optimization (e.g., for tangent alignment) is not guaranteed in nonconvex regimes; local minima can yield only approximately tangential attributions.
- Computational cost increases linearly with the number of integration steps (for most implementations), and for manifold optimization, further overhead arises from tangent estimation.
- The explanatory utility of EAP-IG variants is bounded by the quality of the generative/discriminative manifold model and the faithfulness of surrogates in physical-design tasks.
Rigid axiomatic uniqueness is only preserved for uniform 8; deviations require careful justification in each context.
7. Practical Guidelines
For effective use of EAP-IG:
- For circuit discovery in transformers, use 9 interpolation steps and greedily expand the subgraph until 0\% edge coverage or the target normalized faithfulness is achieved (Hanna et al., 2024).
- When optimizing Riemann weights for noise minimization, precompute breakpoints on a representative validation subset and reuse for bulk attribution (Swain et al., 2024).
- For tangentially aligned IG, set latent dimensionality of the autoencoder according to observed data manifold rank, and apply regular projection to keep optimized baselines on manifold (Simpson et al., 11 Mar 2025).
- Always report circuit faithfulness, Jaccard overlap, and performance variance under data and hyperparameter perturbations, and perform robustness checks with noise injection (Méloux et al., 1 Oct 2025).
- In image-based tasks, use high percentile clipping of saliency maps and threshold overlays to reveal semantically meaningful attributions; in manifold-constrained settings, validate alignment by measuring the tangentiality score 1.
EAP-IG’s generalization capacity enables tailored attribution design—either for geometric priors, causal science, or improved noise and faithfulness—which can be further specialized through data- or task-adaptive weighting of the IG integral. This flexibility makes EAP-IG foundational to current and emerging explainability methodologies in high-dimensional, structured domains.