Interpreting Deep Visual Representations via Network Dissection (1711.05611v2)

Published 15 Nov 2017 in cs.CV

Abstract: The success of recent deep convolutional neural networks (CNNs) depends on learning hidden representations that can summarize the important factors of variation behind the data. However, CNNs often criticized as being black boxes that lack interpretability, since they have millions of unexplained model parameters. In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations. The proposed method quantifies the interpretability of CNN representations by evaluating the alignment between individual hidden units and a set of visual semantic concepts. By identifying the best alignments, units are given human interpretable labels across a range of objects, parts, scenes, textures, materials, and colors. The method reveals that deep representations are more transparent and interpretable than expected: we find that representations are significantly more interpretable than they would be under a random equivalently powerful basis. We apply the method to interpret and compare the latent representations of various network architectures trained to solve different supervised and self-supervised training tasks. We then examine factors affecting the network interpretability such as the number of the training iterations, regularizations, different initializations, and the network depth and width. Finally we show that the interpreted units can be used to provide explicit explanations of a prediction given by a CNN for an image. Our results highlight that interpretability is an important property of deep neural networks that provides new insights into their hierarchical structure.

PDF Abstract

Interpreting Deep Visual Representations via Network Dissection

The paper "Interpreting Deep Visual Representations via Network Dissection" presents a novel approach to demystifying the internal workings of Convolutional Neural Networks (CNNs) by providing interpretable semantics to individual hidden units. This methodology, referred to as Network Dissection, aims to quantify the interpretability of CNN representations by evaluating the alignment between individual latent units and visual semantic concepts. In doing so, the research addresses several core questions: how to define a disentangled representation within neural networks, the conditions under which such representations emerge, and the factors affecting the extent of disentanglement.

Methodology and Concepts

The principal contribution of this work is the development of a framework that systematically analyzes CNN architectures to identify and categorize the semantic meanings of their hidden units. This is achieved through several key steps:

Dataset Construction: The research introduces the Broden dataset, which amalgamates several existing labeled datasets to create a comprehensive collection of colors, textures, materials, parts, objects, and scenes. This dataset serves as a foundation for mapping the CNN's activations to human-interpretable concepts.
Measurement of Interpretability: Each hidden unit in a CNN is examined for activation patterns corresponding to known visual concepts in the Broden dataset. The level of alignment between a unit’s activations and a visual concept is quantified using Intersection over Union (IoU) scores, providing a metric for interpretability.
Analytical Framework: Diverse network architectures and training conditions are subjected to Network Dissection to determine how different design choices impact the emergence of interpretability. The paper spans models such as AlexNet, VGG, and ResNet, covering both supervised and self-supervised learning paradigms.

Experimental Results and Discussion

A series of experiments reveal significant findings regarding the interpretability of CNNs:

Axis Alignment: The paper finds that interpretable representations often align with axis-aligned bases within CNNs. Interpretable features are markedly reduced upon the random rotation of this basis, suggesting that certain architectures may possess an inherent alignment that favors interpretability.
Training Conditions: The paper reveals that factors such as dropout, batch normalization, and training duration affect the degree of interpretability. Surprisingly, architectures with batch normalization tend to possess greater discrimination power but notably less interpretability.
Transfer Learning: When networks are fine-tuned across domains, individual units adapt by altering their concept detections, suggesting an interplay between domain knowledge and interpretable feature emergence.
Depth and Width: The depth of a network correlates with greater semantic complexity in emergent concepts, while width contributes to the diversity of detected concepts. However, beyond certain dimensions, these factors yield diminishing returns with respect to interpretability.

Implications and Future Directions

The implications of this research are far-reaching within the landscape of machine learning. By providing tools to understand CNNs’ inner workings, Network Dissection facilitates a more principled approach to model evaluation, going beyond mere accuracy metrics to consider model transparency and explainability. Moreover, understanding the facets of interpretability empowers researchers to craft architectures that not only perform well but are also amenable to human interpretation, a critical requirement for applications involving human-machine collaboration and trust.

Looking toward future developments, expanding the breadth and depth of datasets like Broden would enhance the resolution at which network units can be interpreted. Furthermore, the exploration of architectures inherently designed with an interpretability focus remains an open frontier, dovetailing into ethical AI considerations and regulation compliance. As the field progresses, the challenge remains to harmonize the tension between optimization for task performance and the human-centric requirement for transparency and accountability in AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Bolei Zhou (134 papers)
David Bau (62 papers)
Aude Oliva (42 papers)
Antonio Torralba (178 papers)

Citations (315)

View on Semantic Scholar

Interpreting Deep Visual Representations via Network Dissection (1711.05611v2)