Temporal Integrated Gradients (TIG)
- Temporal Integrated Gradients (TIG) are a principled attribution method that generalizes Integrated Gradients to time-series data by quantifying feature contributions at each time step.
- They incorporate temporality-aware integration paths and segment-wise masking to mitigate out-of-distribution artifacts and preserve local temporal dependencies.
- Empirical evaluations using metrics like CPD and CPP demonstrate TIG’s enhanced fidelity and robustness over traditional explainability methods for sequential models.
Temporal Integrated Gradients (TIG) is an axiomatic attribution method for interpreting neural sequence models, generalizing the well-established Integrated Gradients (IG) technique to time-series inputs. TIG quantifies the contribution of each input feature at every time step by accumulating gradients along a path from a user-chosen baseline to the observed temporal instance. Recent advancements address the unique challenges of temporal data—including out-of-distribution (OOD) interpolation artifacts, loss of context-sensitive temporal dependencies, and inadequate evaluation metrics—by introducing temporality-aware integration paths, segment-wise masking, and domain-specific aggregation. State-of-the-art frameworks such as TIMING (Jang et al., 5 Jun 2025) and IGBO (Fouladi et al., 2 Jan 2026) preserve key IG axioms while offering enhanced fidelity and robustness for sequential model explainability.
1. Mathematical Formulation and Core Principles
Let be a multivariate time series, with time steps and features. A baseline anchors the input path. Define a sequence model . Standard TIG follows the straight-line path (). The TIG attribution for location is:
This line-integral formulation computes the accumulation of local gradients with respect to each feature at every time point as the input transitions from baseline to its observed value. TIG generalizes IG from static inputs to time-indexed, high-dimensional input spaces.
Generalizations to arbitrary domains (e.g., frequency, ICA) are possible:
where and (Kechris et al., 19 May 2025).
2. Temporality-Aware Path and Masking
Conventional IG's linear interpolation can traverse OOD regions never encountered during model training, especially in temporally dependent, nonlinear time series. The TIMING framework remedies this by introducing temporality-respecting masks:
- Let denote a binary mask selecting contiguous segments of arbitrary length .
- The masked baseline is .
- The path is .
- MaskingIG for is:
- TIMING computes the conditional expectation over masks:
This approach enforces in-distribution sampling and preserves temporally local relationships, outperforming pointwise masking (as in RandIG).
3. Axiomatic Guarantees and Limitations
TIG preserves IG’s canonical axioms given certain constraints:
- Sensitivity: If only differs between and , and , then ( mask reduces to IG at that coordinate).
- Implementation Invariance: Attributions depend only on the model's functional gradient along the chosen path; thus, functionally equivalent models yield identical TIG values.
- Completeness: holds for the straight-line path but not for aggregation over multiple masking contexts (a conscious trade-off in TIMING).
Nonlinear, adaptive, or data-manifold-aware paths have been proposed to further remedy OOD artifacts (Fouladi et al., 2 Jan 2026). The IGBO framework uses a learnable Oracle producing anchor sequences , constructing piecewise-linear, validity-constrained paths to maintain gradient stability. Completeness and sensitivity persist in these generalizations, while practical implementations lose strict completeness for richer baseline diversity.
4. Evaluation Metrics: CPD and CPP
Traditional simultaneous removal benchmarks are confounded by sign cancellation, where positive and negative attributions mask each other's global impact. TIMING introduces cumulative metrics to disambiguate feature ordering:
- Cumulative Prediction Difference (CPD):
where denotes the input after masking the most highly attributed points. CPD accumulates absolute prediction change for highly ranked locations.
- Cumulative Prediction Preservation (CPP):
masking least important points. Small CPP indicates robust retention of prediction accuracy under removal of low-ranked features.
These metrics systematically reward attribution methods that correctly rank both positively and negatively impactful locations and discriminate methods with sign or mask biases (Jang et al., 5 Jun 2025).
5. Implementation Details and Practical Optimization
Examples of TIG implementation, as abstracted from TIMING and IGBO:
1 2 3 4 5 |
for alpha in linspace(0, 1, N): x_alpha = baseline + alpha * (input - baseline) gradient = grad(model(x_alpha), x_alpha) accumulate gradients TIG = (input - baseline) * mean_gradients |
TIMING and IGBO further optimize by:
- Sampling masks or anchor paths in Monte Carlo fashion.
- Batching segment masks.
- Parallelizing integration over .
- Employing GAN-discriminators and RNN oracles to ensure generated anchors remain on the data manifold (Fouladi et al., 2 Jan 2026).
The practical guidance is to select anchor points for path fidelity, integration steps for numerical resolution, and mask parameters for temporal segment granularity. TIMING computes attributions in approximately $0.04$ seconds per sample, comparable to vanilla IG and faster than retraining-based XAI methods (Jang et al., 5 Jun 2025).
6. Empirical Results and Domain Applications
Comprehensive benchmarking compares TIG and TIMING against FO, AFO, IG, GradSHAP, DeepLIFT, LIME, FIT, WinIT, Dynamask, Extrmask, ContraLSP, TimeX, TimeX++ on real-world datasets (e.g., MIMIC-III mortality, PAM, Boiler, Epilepsy, Wafer, Freezer):
| Dataset | CPD (IG) | CPD (TIMING) | Relative Gain |
|---|---|---|---|
| MIMIC-III | 0.342 | 0.366 | +7% |
| PAM | --- | --- | +5% |
| Boiler | --- | --- | +110% |
| Epilepsy | --- | --- | +11% |
| Wafer | --- | --- | +35% |
| Freezer | --- | --- | +1% |
TIMING outperforms IG and leading baselines under CPD, with robust CPP behavior. On synthetic Switch-Feature and State datasets, TIMING matches IG on true-map metrics (AUP, AUR), but achieves superior CPD (Jang et al., 5 Jun 2025). Qualitative case studies (MIMIC-III) demonstrate medically coherent signed attributions for risk and protective factors.
IGBO evaluates DAG Satisfaction Rate, accuracy trade-offs, OOD consistency, and variance reduction. TIG plus Oracle achieves 80% DSR, minimal 5% accuracy loss, and significant variance reduction compared to linear baselines (Fouladi et al., 2 Jan 2026).
7. Limitations, Open Issues, and Prospective Directions
TIG suffers from several notable limitations:
- Completeness may be lost under aggregation of masks or nonlinear baselines.
- OOD sensitivity in high-dimensional time series can destabilize gradient estimates; manifold-aware path construction mitigates but does not eliminate this concern.
- CPD and CPP metrics use distance on class probabilities; extension to structured outputs or alternative distance measures remains unexplored.
- Mask parameter tuning and baseline selection are nontrivial and domain-dependent.
- Adaptive, data-driven segment generators, higher-dimensional baselines, and hybrid completeness-restoring strategies are suggested avenues for further investigation (Jang et al., 5 Jun 2025, Fouladi et al., 2 Jan 2026).
Cross-domain TIG (e.g., Fourier or ICA domains) generalizes interpretability across semantically meaningful transforms, requiring invertible, differentiable and inheriting all classical IG caveats such as gradient saturation and integration discretization error (Kechris et al., 19 May 2025).
In summary, TIG and its temporality-aware extensions constitute a principled, differentiable attribution framework for time-series neural networks, with demonstrated empirical, theoretical, and computational advantages over prior post-hoc explainability methods.