How Important Is a Neuron? (1805.12233v1)

Published 30 May 2018 in cs.LG and stat.ML

Abstract: The problem of attributing a deep network's prediction to its \emph{input/base} features is well-studied. We introduce the notion of \emph{conductance} to extend the notion of attribution to the understanding the importance of \emph{hidden} units. Informally, the conductance of a hidden unit of a deep network is the \emph{flow} of attribution via this hidden unit. We use conductance to understand the importance of a hidden unit to the prediction for a specific input, or over a set of inputs. We evaluate the effectiveness of conductance in multiple ways, including theoretical properties, ablation studies, and a feature selection task. The empirical evaluations are done using the Inception network over ImageNet data, and a sentiment analysis network over reviews. In both cases, we demonstrate the effectiveness of conductance in identifying interesting insights about the internal workings of these networks.

PDF Abstract

An Analysis of Neuron Importance in Deep Learning

The paper "How Important Is a Neuron?" by Dhamdhere, Sundararajan, and Yan explores a key question in neural network interpretability: how can the importance of hidden units (neurons) within a neural network be quantified? Existing literature has extensively addressed the attribution of deep network predictions to input features, but the evaluation of hidden unit significance remains less understood. This paper proposes a novel metric called "conductance" to address this gap.

The proposed concept of conductance is rooted in the established technique of Integrated Gradients, introduced by Sundararajan et al., that utilizes path integrals over input gradients to assess feature importance. Conductance extends this technique to the hidden layers of a neural network, offering a method to determine a hidden unit’s contribution to a specific prediction. This is formulated through derivative computations in line with the chain rule, creating a measure of the flow of attributions through hidden units.

The authors evaluate conductance against alternate attribution methods such as Activation, Gradient*Activation, and Internal Influence. Conductance stands out by satisfying completeness, meaning that the sum of conductances for any hidden layer equals the difference in the network's prediction between a baseline and its input, thus ensuring an accurate distribution of attribution through the network's layers. The paper notes that neither Activation nor Internal Influence maintain this property, with theoretical and empirical analysis indicating the potential for these methods to produce misleading feature importance scores due to issues such as saturation in non-linear layers.

The research employs conductance in empirical studies on two frontiers: an object recognition network (Inception on ImageNet) and a sentiment analysis network. Within these studies, conductance successfully identifies a small subset of filters with significant influence on predictions, offering a clear, empirical illustration of its efficacy in feature selection tasks across varying architectures and domains. Notably, the ablation studies conducted in the object recognition model suggest that only a few critical filters need removal to significantly impact prediction outcomes, corroborating conductance as effective in highlighting key predictive units.

One notable insight is from the sentiment analysis network, where conductance reveals a clear division of labor among filters handling positive versus negative sentiments. This reveals a level of semantic complexity and redundancy in the network's structure that highlights its robustness. Furthermore, the paper also isolates filters responsible for capturing negational logic in sentiment, demonstrating the potential of conductance to elucidate complex linguistic phenomena encoded in network parameters.

The implications of this research are multi-faceted. Conductance not only enhances interpretability in neural networks, which is crucial for trusted deployment in sensitive applications, but also aids model compression and pruning by identifying indispensable units. Moreover, this groundwork opens avenues for further research in model transparency, potentially influencing future architectures with built-in interpretability considerations.

In conclusion, the paper provides a significant contribution to the transparency and interpretability of neural networks by offering a robust method for neuron attribution, potentially guiding subsequent innovations in network analysis and modification. Future developments in AI can leverage such insights to drive both architectural optimization and ethical assurance in AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Kedar Dhamdhere (5 papers)
Mukund Sundararajan (27 papers)
Qiqi Yan (12 papers)

Citations (118)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos