Network Dissection: Quantifying Interpretability of Deep Visual Representations
The paper "Network Dissection: Quantifying Interpretability of Deep Visual Representations" by David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba introduces a framework for assessing the interpretability of convolutional neural networks (CNNs). By evaluating the alignment between individual units within CNNs and an extensive dataset of semantic concepts, this work provides a mechanism to quantify and analyze the interpretability of deep visual representations.
Methodology
The proposed framework, termed "Network Dissection," involves three primary steps:
- Identification of Human-Labeled Visual Concepts: The method uses the Broadly and Densely Labeled Dataset (Broden), which unifies several datasets (such as ADE, OpenSurfaces, Pascal-Context, Pascal-Part, and the Describable Textures Dataset). Broden provides annotated examples of various visual concepts, from colors to objects and textures.
- Collection of Hidden Unit Responses: For each CNN under evaluation, the activations of individual hidden units are matched against the visual concepts in Broden. Response maps are generated using bilinear interpolation to align lower-resolution activation maps with higher-resolution concept annotations.
- Quantification of Alignment: The interpretability of a unit is measured by evaluating its activations as a binary segmentation task. This involves computing the intersection over union (IoU) between unit activations and concept annotations across the dataset.
Experiments and Results
Validation and Interpretability
Human evaluators corroborated the framework's output, finding a high correspondence between assigned and human-identified concepts, particularly in higher layers of the network. The emergence of interpretable structure in CNNs lends credence to the spontaneous learning of disentangled representations.
Training Conditions and Network Parameters
The experiments covered various network architectures and training conditions. Key findings include:
- Influence of Network Architecture: Interpretability varies across architectures like AlexNet, VGG, and ResNet, with deeper networks exhibiting more interpretable units.
- Training Data Impact: Context (e.g., scene-centric Places365 vs. object-centric ImageNet) influences the emergence of distinct interpretable concepts.
- Self-Supervised Techniques: Models trained on self-supervised tasks exhibited varied interpretability, notably fewer object detectors compared to supervised models.
- Effects of Regularization: Dropout, batch normalization, and the number of training iterations significantly impact network interpretability. Batch normalization, in particular, reduced interpretability, suggesting it may facilitate axis rotation, reducing the alignment of units with human-understandable concepts.
Future Directions and Practical Implications
Increasing the width of layers (number of units) notably enhances the emergence of interpretable units without significant detriment to discriminative power. This suggests that network capacity plays a crucial role in the alignment of internal representations with semantic concepts.
Conclusion
Network Dissection provides a compelling methodology for evaluating and improving the interpretability of CNNs. These findings have both theoretical and practical implications, suggesting further exploration of architectural modifications and training strategies to enhance transparency and alignment of deep learning models with human-understandable concepts. Future innovations may focus on mitigating the interpretability losses incurred by certain regularization methods like batch normalization while maintaining discriminative performance.