Amplitude-Based Input Attribution
- Amplitude-based input attribution scores decompose a model's prediction by multiplying the difference between input values and a baseline with the model’s local gradient.
- They extend to higher-order derivatives, capturing both independent and interactive feature effects for robust explanatory power.
- Empirical validations show that adhering to principles like minimal approximation error and unbiased baseline selection significantly improves attribution fidelity.
Amplitude-based input attribution scores are a class of feature explanation methods that quantify how much the magnitude (“amplitude”) of a difference between input and a reference point, when multiplied by a derivative of the model output, contributes to the overall prediction. These scores are central in explaining deep neural networks and other machine learning models by decomposing the model’s prediction into additive contributions from each input variable, with the core principle rooted in the Taylor expansion of the predictive function. Amplitude-based approaches are characterized by their explicit dependence on the numerical difference between an input feature and a baseline, thereby giving a quantitative measure of each feature’s importance grounded in the model’s local or path-wise sensitivity.
1. Theoretical Foundations: Taylor Attribution Framework
Amplitude-based input attribution scores are systematically unified under the Taylor attribution framework (Deng et al., 2020), which models the output difference between an input and a baseline by a finite Taylor series expansion:
where and each term (indexed by multi-index ) is a product of higher-order derivatives and amplitude powers.
For the first-order (linear) case:
The amplitude-based attribution for feature is then the product of its amplitude difference and the output’s sensitivity to that feature:
This decomposition naturally extends to higher-order expansions, introducing second-order “independent” and “interactive” effects between features. The amplitude-based interpretation holds across all such terms, allocating the output change proportionally to each feature according to its contribution’s magnitude.
2. Reformulation of Mainstream Attribution Methods
Seven mainstream attribution algorithms can be rewritten in the Taylor framework as specific choices of terms and baselines (Deng et al., 2020). The table below summarizes the reformulations:
| Method | Taylor Terms Captured | Baseline Selection |
|---|---|---|
| Gradient Input | First-order term only | Baseline |
| Occlusion-1 | First-order and feature-specific higher-order diagonal term | Feature zeroed |
| Occlusion-patch | First-order, within-patch higher-order, within-patch interactions | Patch zeroed |
| DeepLIFT/ε-LRP | First-order and higher-order terms, baseline-propagated | Layer-wise, user-specified |
| Integrated Gradients | First-order, higher-order, split interactions | Path-integrated baseline |
| Expected Gradients | Averaged Taylor decompositions (over baseline distribution) | Baseline distribution |
All of these methods multiply an amplitude by one or more derivatives, but differ in order, assignment, and baseline handling. Methods that average over multiple baselines (e.g., Expected Gradients) control for baseline selection bias and often achieve higher-fidelity attributions by capturing more of the function’s variability along different paths (Deng et al., 2020).
3. Principles for Reliable Amplitude-Based Attribution
The Taylor attribution framework motivates three principles for high-quality attribution (Deng et al., 2020):
- Low Approximation Error: The chosen Taylor expansion should closely approximate the true . Failing to do so omits significant effects, especially in highly nonlinear regions.
- Correct Contribution Assignment: Each Taylor term (independent or interaction) must be allocated to the corresponding feature(s) without leakage.
- Unbiased Baseline Selection: The choice of baseline should not introduce artificial bias; a poor baseline misrepresents the true amplitude and distorts importance measures.
Empirical comparison reveals a strong positive correlation between the number of these principles satisfied and observed fidelity/localization metrics across benchmarks such as MNIST and ImageNet.
4. Amplitude, Higher-Order Terms, and Baseline Effects
In the amplitude-based paradigm, the essential element is the scale of the input difference:
Higher-order terms account for curvature and inter-feature effects:
- Independent second-order:
- Interactive second-order: , assigned (e.g., split halved) between and
Correct partitioning of these interactions is critical for the completeness of attributions. Amplitude-based scores ignoring higher-order or interaction terms may fail to “explain” output changes in complex models, particularly those with strong feature interactions (Deng et al., 2023).
Moreover, baseline selection heavily influences amplitude values. For example, setting can result in inflated attributions for features with inherently large magnitudes, regardless of their actual causal importance, underlining the necessity of establishing a meaningful baseline (Deng et al., 2020).
5. Empirical Validation and Application
Empirical validation via benchmark datasets shows that Taylor-reformulated amplitude-based inputs closely track the underlying attributions produced by their heuristic originals, with near-zero average percentage change when the Taylor order matches model complexity (e.g., MLPs on MNIST) (Deng et al., 2020). Large discrepancies emerge only in regions where Taylor approximation is insufficiently accurate, highlighting the need for method-model alignment.
Performance metrics for attribution fidelity, such as “infidelity” (how much perturbing high-attribution features changes the output) and object localization accuracy, demonstrate pronounced improvement when amplitude-based attributions adhere to all three principles. For instance, both Integrated Gradients and Expected Gradients (averaged over multiple baselines) consistently outperform single-baseline, purely first-order methods, due to greater coverage of the function’s nonlinearities and more robust feature assignment (Deng et al., 2020).
6. Limitations, Scope, and Theoretical Significance
Amplitude-based scores, while theoretically principled within the Taylor framework, are not universally optimal. Their reliance on the first-order (or locally linear) component will under-explain regions of high nonlinearity or strong cross-feature interaction unless higher-order terms are included. The assignment of interaction terms, and careful selection of unbiased baselines, remains a nuanced and application-dependent task.
Nevertheless, the Taylor-based amplitude paradigm unifies the rationale behind a broad spectrum of widely deployed attribution methods, supplies a rigorous ground for evaluating their rationality and shortcomings, and provides a systematized view of how local, interpretable, and faithful input attributions can be constructed in deep learning and beyond. The strength of amplitude-based input attribution lies in the clarity of its mathematical structure—combining measured magnitude (amplitude) with directional sensitivity (gradient or higher-order derivative)—offering a direct and quantitative account of each feature’s role in the predictive mechanism as grounded in the model itself (Deng et al., 2020).