Network Dissection: Quantifying Interpretability of Deep Visual Representations (1704.05796v1)

Published 19 Apr 2017 in cs.CV and cs.AI

Abstract: We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.

Authors (5)

David Bau (62 papers)
Bolei Zhou (134 papers)
Aditya Khosla (12 papers)
Aude Oliva (42 papers)
Antonio Torralba (178 papers)

Citations (1,424)

View on Semantic Scholar

Summary

Network Dissection: Quantifying Interpretability of Deep Visual Representations

The paper "Network Dissection: Quantifying Interpretability of Deep Visual Representations" by David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba introduces a framework for assessing the interpretability of convolutional neural networks (CNNs). By evaluating the alignment between individual units within CNNs and an extensive dataset of semantic concepts, this work provides a mechanism to quantify and analyze the interpretability of deep visual representations.

Methodology

The proposed framework, termed "Network Dissection," involves three primary steps:

Identification of Human-Labeled Visual Concepts: The method uses the Broadly and Densely Labeled Dataset (Broden), which unifies several datasets (such as ADE, OpenSurfaces, Pascal-Context, Pascal-Part, and the Describable Textures Dataset). Broden provides annotated examples of various visual concepts, from colors to objects and textures.
Collection of Hidden Unit Responses: For each CNN under evaluation, the activations of individual hidden units are matched against the visual concepts in Broden. Response maps are generated using bilinear interpolation to align lower-resolution activation maps with higher-resolution concept annotations.
Quantification of Alignment: The interpretability of a unit is measured by evaluating its activations as a binary segmentation task. This involves computing the intersection over union (IoU) between unit activations and concept annotations across the dataset.

Experiments and Results

Validation and Interpretability

Human evaluators corroborated the framework's output, finding a high correspondence between assigned and human-identified concepts, particularly in higher layers of the network. The emergence of interpretable structure in CNNs lends credence to the spontaneous learning of disentangled representations.

Training Conditions and Network Parameters

The experiments covered various network architectures and training conditions. Key findings include:

Influence of Network Architecture: Interpretability varies across architectures like AlexNet, VGG, and ResNet, with deeper networks exhibiting more interpretable units.
Training Data Impact: Context (e.g., scene-centric Places365 vs. object-centric ImageNet) influences the emergence of distinct interpretable concepts.
Self-Supervised Techniques: Models trained on self-supervised tasks exhibited varied interpretability, notably fewer object detectors compared to supervised models.
Effects of Regularization: Dropout, batch normalization, and the number of training iterations significantly impact network interpretability. Batch normalization, in particular, reduced interpretability, suggesting it may facilitate axis rotation, reducing the alignment of units with human-understandable concepts.

Future Directions and Practical Implications

Increasing the width of layers (number of units) notably enhances the emergence of interpretable units without significant detriment to discriminative power. This suggests that network capacity plays a crucial role in the alignment of internal representations with semantic concepts.

Conclusion

Network Dissection provides a compelling methodology for evaluating and improving the interpretability of CNNs. These findings have both theoretical and practical implications, suggesting further exploration of architectural modifications and training strategies to enhance transparency and alignment of deep learning models with human-understandable concepts. Future innovations may focus on mitigating the interpretability losses incurred by certain regularization methods like batch normalization while maintaining discriminative performance.

PDF Markdown