Deep Taylor Decomposition
- Deep Taylor Decomposition is a framework that uses recursive Taylor expansions to break down a model's output into input feature relevance scores while ensuring relevance conservation.
- It applies specific propagation rules like z⁺, ε, and αβ to handle layer nonlinearities, enabling clear and auditable feature attributions across diverse architectures.
- The method has been applied in image classification, NLP with transformers, and anomaly detection, providing visual heatmaps and insights for expert model evaluation.
Deep Taylor Decomposition (DTD) is a framework for explaining complex nonlinear machine learning models by decomposing model outputs into per-input feature relevance scores. DTD is built on local Taylor expansions performed recursively throughout the layers of a network, ensuring that the sum of input relevances exactly matches the model’s output (“relevance conservation”). While initially motivated by the need for intelligible explanations in image classification, DTD has since been generalized to a wide range of architectures and tasks, including convolutional, transformer-based, and one-class anomaly detection models. DTD provides explanations that can be visualized, audited, and assessed by domain experts, but its theoretical basis and practical applicability have also given rise to important critiques and empirical challenges.
1. Theoretical Framework and Motivation
DTD formalizes the attribution of output decisions of deep nonlinear models—including DNNs (feed-forward, CNNs, Transformers)—to their input dimensions. The main goal is to express the scalar output as a sum of input-wise relevances so that . The method proceeds recursively by applying first-order Taylor expansions at each hidden layer, beginning at the output and using appropriate root points where the local relevance function vanishes.
The motivation for DTD arises from the black-box nature of modern deep networks, which lack inherent interpretability. Post-hoc feature-attribution approaches such as DTD respond to this limitation by assigning per-feature relevance in a manner that propagates explanatory responsibility through all layers and non-linearities of the network. Unlike naive gradient-based saliency, DTD respects network nonlinearities and layer interactions via “relevance conservation,” delivering relevance assignment rules (notably the , , and rules) that are layer- and domain-appropriate (Montavon et al., 2015).
Moreover, DTD has been extended to handle network-specific complexities, including attention mechanisms, skip connections, and normalization layers in modern architectures such as Transformers (Brandl et al., 2022).
2. Mathematical Formulation and Propagation Rules
The backbone of DTD is the recursive, layer-wise Taylor expansion. For a scalar output and chosen root point such that , the Taylor expansion gives: 0 with the input-wise relevance
1
and 2. At each intermediate layer, relevance 3 is propagated from the higher layer 4 to 5 according to the rule
6
where 7 encodes the local Taylor weight specific to the propagation rule. Common rules:
- 8-rule: For ReLU layers, 9, distributing relevance in proportion to positive contributions (Montavon et al., 2015, Brandl et al., 2022).
- 0-rule: Adds a small 1 stabilizer to denominators to avoid division by zero (Brandl et al., 2022, Hiley et al., 2019).
- 2-rule: Allows partitioning relevance between positive and negative contributions with user-chosen 3 (4) (Hiley et al., 2019).
- 5-rule: For bounded inputs, subtracts boundary-weighted terms (Montavon et al., 2015, Hiley et al., 2019).
In Transformers, DTD incorporates rules for multi-head attention, layer norm as affine, and skip (residual) connections, adapting the local Taylor expansions to match these functional forms (Brandl et al., 2022).
3. Application Domains and Implementation Variants
Deep Neural Networks (DNNs)
DTD was originally developed for feedforward and convolutional networks in vision tasks. It is applicable to any architecture where a local Taylor expansion can be constructed, including sum-pooling and detection layers. Empirical results on MNIST and ILSVRC demonstrated heatmaps that are more interpretable and positive than sensitivity-based methods, with precise coverage of object regions (Montavon et al., 2015).
Transformers and NLP
DTD has been extended to transformer architectures, including RoBERTa-Large, for feature attribution at the token level. The implementation involves fine-tuning the model, forward-passing to obtain logits, and then performing detailed backward relevance propagation through fully connected, attention, and normalization blocks, ultimately yielding a normalized scalar relevance for each input token (Brandl et al., 2022).
Video and Spatio-Temporal Models
In activity recognition with 3D CNNs, vanilla DTD assigns joint spatio-temporal relevance but does not directly distinguish between spatial and temporal drivers of relevance. A discriminative extension computes the spatial-only relevance map by evaluating DTD on freeze-frame inputs, then extracts motion relevance as the difference between the original and spatial DTD maps (Hiley et al., 2019).
One-Class Models and Anomaly Detection
DTD has been adapted to kernel-based one-class SVMs by recasting the discriminant as a two-layer neural net. Attributions are derived by Taylor-propagating through the anomaly measure’s layers and then via integrated gradients to input features. This method yields anomaly “heatmaps” which localize contributions more faithfully than sensitivity analysis or heuristic feature removal (Kauffmann et al., 2018).
4. Empirical Evaluations and Human-in-the-Loop Assessment
Evaluations of DTD have included both controlled experiments and real-world deployments:
- In-the-wild reliability assessment: DTD-feature attributions were shown to professional journalists performing news reliability judgments. In comparison to text-only or confidence-only baselines, DTD highlights on tokens led to improved accuracy (error rate reduced from 0.31 to 0.22), with the largest gains for journalists with less than three years’ experience. Model-derived rationales sometimes quickened decision-making, and qualitative feedback was mixed, highlighting concerns about perceived randomness or bias (Brandl et al., 2022).
- Visualization and interpretability: In image, video, and anomaly detection settings, DTD-based explanations provided more focused and class-consistent heatmaps than gradient or perturbation methods. Spatio-temporal video decompositions using the discriminative DTD variant revealed diagnostic motion cues unavailable in vanilla DTD approaches (Montavon et al., 2015, Hiley et al., 2019, Kauffmann et al., 2018).
5. Theoretical Critique, Limitations, and Best Practices
Recent theoretical investigation has exposed fundamental limitations of DTD:
- Root point ambiguity: The root point at each layer is not prescribed by theory; with fixed (locally constant) roots, DTD reduces to 6 up to a constant, i.e., the “gradient7input” method. If input-dependent (nonconstant) roots are permitted, the explanations become under-constrained—one can generate arbitrary attributions by varying the root’s Jacobian (Sixt et al., 2022).
- Violation of Taylor expansion assumptions: Empirical investigations show that in standard DTD implementations, root points frequently fall outside the input's linear region, invalidating the first-order Taylor expansion. The conservation guarantee and the interpretability of relevance are thus frequently violated in practice (Sixt et al., 2022).
- Practical recommendations: Practitioners are advised to sanity-check attribution stability (e.g., under target-class randomization), verify root-region consistency, and combine DTD with complementary explanation techniques possessing stronger axiomatic foundations (e.g., integrated gradients, concept activation vectors, counterfactuals) (Sixt et al., 2022).
6. Impact, Practical Utility, and Open Challenges
DTD has advanced the field of model interpretability by providing a general, flexible, and conservation-based approach to feature attribution across diverse domains. It is especially notable for its influence in vision, NLP, and anomaly detection applications, and for introducing principled, rules-driven explanation mechanisms that are accessible to both model developers and domain experts.
However, open challenges remain:
- The utility of DTD explanations is context-dependent, varying with domain expertise—novices may benefit, whereas experts may be hindered by spurious rationales (Brandl et al., 2022).
- Alignment between user trust and explanation accuracy is not guaranteed.
- For modern architectures (e.g., Transformers), DTD is computationally more intensive than simple gradients, making real-time integration nontrivial.
- Theoretical risk arises from under-constrained or ill-posed root selections, and from frequent empirical violations of the Taylor expansion conditions required for DTD’s guarantee.
A plausible implication is that while DTD remains a valued element of the explainability toolkit, its deployment should be accompanied by rigorous validation and supplemented with complementary interpretability methods to ensure trustworthiness and practical utility.