Not Just a Black Box: Learning Important Features Through Propagating Activation Differences (1605.01713v3)

Published 5 May 2016 in cs.LG, cs.CV, and cs.NE

Abstract: Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.

Citations (745)

View on Semantic Scholar

Summary

The paper introduces DeepLIFT, a novel approach that computes contribution scores by comparing actual and reference activations to reveal feature importance.
It utilizes a backpropagation method with adaptable multipliers to overcome issues with zero or vanishing gradients in ReLU, sigmoid, and tanh activations.
Results on Tiny ImageNet and genomic sequence data validate DeepLIFT’s ability to reliably highlight critical features in complex neural network models.

Learning Important Features Through Propagating Activation Differences

The paper introduces DeepLIFT (Learning Important Features), a novel method aimed at addressing the interpretability of neural networks by providing clarity on the importance of various features contributing to a network's output. Unlike gradient-based approaches, which frequently encounter issues due to ReLU's zero-gradient zones and the near-zero gradients in sigmoid and tanh functions at high or low inputs, DeepLIFT leverages a method of comparing activations to their respective reference activations.

Methodology

DeepLIFT computes a contribution score for each neuron by evaluating the difference between its actual activation and a predefined reference activation. This deviation, referred to as $\delta$ , forms the basis of the contribution scores, $C_{xy}$ . The method adheres to two critical properties:

Summation to $\delta$ : For any neuron $y$ , the sum of the contributions from a set of minimally sufficient input neurons $S$ equals $\delta_y$ .
Linear Composition: Contributions can be passed back through layers akin to gradient backpropagation but facilitated by multipliers, which enhance numerical stability.

To backpropagate contributions, DeepLIFT employs rules that accommodate affine functions, max operations, and maxout units, amongst others. This defines adaptable multipliers $m_{xy}$ such that their product with $\delta_x$ renders the contribution $C_{xy}$ .

Results

The efficacy of DeepLIFT is demonstrated through two primary applications: the Tiny ImageNet dataset and genomic sequence data analysis.

Tiny ImageNet

For Tiny ImageNet, a VGG16 architecture was trained with images scaled to $64 \times 64$ pixels across 200 classes. The Comparative visualizations showed DeepLIFT's superior feature attribution, especially when compared to gradient-based methods that often fail to highlight relevant features due to zero-gradient issues in non-firing ReLUs.

Genomics

DeepLIFT's application to genomic sequence analysis also upheld its effectiveness. It successfully identified critical DNA motifs in sequences designed to include specific binding patterns. The resultant DeepLIFT scores were notably more reliable than gradient*input scores, which failed to highlight essential motifs consistently.

Discussion

The method stands out by addressing the flaws inherent in gradient-based feature importance methods. By comparing activations against reference activations, DeepLIFT circumvents the pitfalls of zero gradients in ReLUs and vanishing gradients in sigmoid or tanh activations. Additionally, while earlier methods like Layer-wise Relevance Propagation (LRP) also attempt to attribute feature importance, DeepLIFT's approach proves robust, particularly when biases are non-zero. It uses multipliers to maintain numerical stability, contrasting the susceptibility to instability in LRP.

Implications and Future Work

DeepLIFT’s introduction propounds significant implications for the interpretability of neural networks, particularly in domains requiring high transparency, such as genomics and medical diagnostics. The method enhances trust and affords novel insights into network behavior by providing more accurate feature attributions. Future developments may extend DeepLIFT to more complex neural architectures, including various recurrent neural networks and more sophisticated convolutional networks, tailored to diversified input types and high-stakes applications in AI. Adapting this robust interpretative framework could lead to broader adoption of neural networks in areas previously hindered by their "black box" nature.

In conclusion, DeepLIFT emerges as a critical advancement toward interpretable AI, offering a meaningful improvement over traditional gradient-based approaches in elucidating the internal mechanics of neural networks. The method’s incorporation of reference activation comparison sets a new direction for feature importance attribution in deep learning.

PDF Markdown