Interpretable Convolutional Neural Networks (1710.00935v4)

Published 2 Oct 2017 in cs.CV

Abstract: This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.

Authors (3)

Quanshi Zhang (81 papers)
Ying Nian Wu (138 papers)
Song-Chun Zhu (216 papers)

Citations (752)

View on Semantic Scholar

Summary

The paper introduces a novel loss function that makes each high-layer CNN filter correspond to a distinct object part without requiring extra annotations.
It minimally modifies standard CNN architectures to enhance filter interpretability while preserving robust classification performance.
Experimental results on networks like AlexNet and VGG variants demonstrate significantly improved part interpretability and location stability.

Interpretable Convolutional Neural Networks

The paper "Interpretable Convolutional Neural Networks" by Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu introduces a method to make convolutional neural networks (CNNs) interpretable by assigning explicit semantic meanings to filters in high conv-layers. This approach aims to address the longstanding issue of interpretability in CNNs by enabling each filter in the high conv-layers to represent a specific object part without requiring additional annotations for supervision.

Introduction

Convolutional Neural Networks (CNNs) have shown exceptional performance across various visual tasks including object classification and detection. Despite their high discrimination power, CNNs are often criticized for their lack of interpretability. This paper presents a novel method to convert traditional CNNs into interpretable CNNs, wherein each filter in the high conv-layers corresponds to an object part, thus simplifying the understanding of the network's learned representations.

Key Contributions

Modification for Interpretability: The proposed method involves slight revisions to a standard CNN structure to enhance interpretability, ensuring applicability across different types of CNN architectures.
No Additional Annotations Needed: The method requires no extra annotations of object parts or textures. It automatically adjusts each filter during the training process to represent specific object parts.
Retention of Discrimination Power: While there might be a slight decrease in discrimination power, the method aims to confine this reduction within an acceptable range.
End-to-End Training: The modified interpretable CNNs utilize the same training samples and loss function on the top layer as traditional CNNs, enabling seamless integration into existing training pipelines.

Methodology

The core innovation revolves around adding a specialized loss to each filter in the high conv-layers, guiding it towards the representation of an object part. This loss is designed to ensure:

Low Entropy of Inter-Category Activations: Filters are encouraged to be activated by a single object part pertaining to one category, minimizing activations by other object parts.
Low Entropy of Spatial Distributions: Filters are expected to be activated by specific regions of an object part, ensuring spatial consistency.

The method involves selecting from multiple templates during the forward propagation phase to filter activations in each feature map, and using gradient back-propagation to fine-tune the filters. This dual-stage approach ensures that the learned filters consistently represent meaningful object parts.

Results

The method was tested on several CNN architectures including AlexNet, VGG-M, VGG-S, and VGG-16, using object images from multiple benchmark datasets with landmark/part annotations. Two key metrics were used to evaluate the semantic clarity of the filters:

Part Interpretability: Measurement of how distinctly a filter represents a specific object part.
Location Stability: Measurement of the stability of the inferred part locations with respect to ground-truth landmarks.

The experiments demonstrated that the filters in interpretable CNNs showed significantly better part interpretability and location stability compared to those in ordinary CNNs. For instance, in single-category classification tasks, the average interpretability scores improved noticeably across all tested architectures and datasets.

Implications and Future Work

The implications of this research are manifold:

Enhanced Trust in Predictions: By elucidating the internal logic of CNNs, this method can potentially increase human trust in CNN-generated predictions, facilitating their deployment in critical applications.
Facilitation of Model Debugging: The ability to interpret which parts of the object are being considered for classification can aid in diagnosing and correcting potential biases in the model.

Future research directions could focus on enhancing the flexibility of the models by developing filters that simultaneously describe discriminative textures and shared parts across multiple categories. This would lead to more adaptable and versatile CNN architectures, further bridging the gap between interpretability and discrimination power.

Conclusion

This paper presents a significant advancement in the interpretability of CNNs by proposing a method that ensures each filter in the high conv-layers corresponds to a specific object part. This innovation does not only bolster the interpretability of CNNs but also provides a valuable tool for understanding and debugging complex neural network models. The demonstrated improvements in filter interpretability and location stability highlight the potential of this method to make CNNs more transparent and trustworthy.

PDF Markdown