Papers
Topics
Authors
Recent
Search
2000 character limit reached

Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability

Published 28 Apr 2023 in cs.AI, cs.CV, and cs.LG | (2304.14864v1)

Abstract: Analysis of how semantic concepts are represented within Convolutional Neural Networks (CNNs) is a widely used approach in Explainable Artificial Intelligence (XAI) for interpreting CNNs. A motivation is the need for transparency in safety-critical AI-based systems, as mandated in various domains like automated driving. However, to use the concept representations for safety-relevant purposes, like inspection or error retrieval, these must be of high quality and, in particular, stable. This paper focuses on two stability goals when working with concept representations in computer vision CNNs: stability of concept retrieval and of concept attribution. The guiding use-case is a post-hoc explainability framework for object detection (OD) CNNs, towards which existing concept analysis (CA) methods are successfully adapted. To address concept retrieval stability, we propose a novel metric that considers both concept separation and consistency, and is agnostic to layer and concept representation dimensionality. We then investigate impacts of concept abstraction level, number of concept training samples, CNN size, and concept representation dimensionality on stability. For concept attribution stability we explore the effect of gradient instability on gradient-based explainability methods. The results on various CNNs for classification and object detection yield the main findings that (1) the stability of concept retrieval can be enhanced through dimensionality reduction via data aggregation, and (2) in shallow layers where gradient instability is more pronounced, gradient smoothing techniques are advised. Finally, our approach provides valuable insights into selecting the appropriate layer and concept representation dimensionality, paving the way towards CA in safety-critical XAI applications.

Citations (6)

Summary

  • The paper presents a novel framework for evaluating the stability of semantic concept representations in CNNs by combining supervised and unsupervised techniques.
  • Experiments demonstrate that 1D-CAVs offer superior stability through enhanced concept separation and resistance to gradient instability compared to higher-dimensional representations.
  • The study recommends using at least 40-60 concept samples per concept and applying gradient smoothing in shallow layers to improve concept attribution stability.

Evaluating Stability of Concept Representations for Explainability

This paper (2304.14864) introduces a framework and metrics for evaluating the stability of semantic concept representations in CNNs, particularly within the context of XAI for safety-critical systems such as those used in automated driving. The focus is on both the stability of concept retrieval and concept attribution, addressing the adaptation of concept analysis methods for object detection CNNs.

Proposed Methodology

The framework leverages a combination of supervised and unsupervised CA techniques to achieve label-efficient retrieval and utilization of interpretable concepts for explaining both classification and object detection backbones. The overall framework for CAV stability and concept attribution stability is shown in (Figure 1). Figure 1

Figure 1: The framework for estimation of CAV stability and concept attribution stability. The proposed solution utilizes unsupervised ICE to aid concept discovery and labeling, while supervised TCAV is used for the generation of concept representations.

Concept Pool

The methodology involves building an extensible concept pool containing human-validated mined concepts extracted from the trained model under test, alongside existing manually labeled concepts. Unsupervised concept mining is employed to streamline the manual annotation process and accelerate concept labeling. The paper utilizes ICE to extract image patches that cause common patterns in the CNN image activations. The sets of mined image patches then undergo manual concept validation, where a human annotator assigns a label to each mined concept. Interpreted concepts, if meaningful, can either be directly added to the set of labeled concepts or be utilized in synthetic concept generation to obtain more complex synthetic concept samples.

Concept Stability Analysis

Supervised CA is performed to obtain CAVs for the concepts in the pool. CAV training is conducted on concept activations, i.e., CNN activations of concept images from the labeled concepts in the concept pool. Per-sample concept attribution is then calculated using backpropagation-based sensitivity methods. The resulting CAVs and concept sensitivity scores are used for local and global explanation purposes. The paper uses TCAV as the baseline global concept vector representation for stability estimation. A binary linear classifier is trained to predict the presence of a concept from the intermediate neuron activations in the selected CNN layer. The classifier weights serve as the CAV. For concept attribution, the sensitivity score calculation from \cite{kim2018interpretability} is adopted.

Object Detection Adaptation

The paper adapts CA for object detection (OD) by addressing challenges such as multiple predictions and suppressive post-processing. For multiple predictions, the concept weights and importance estimation component of ICE require adjustments. The pipeline assesses the effect of small modifications to each concept on the final class prediction on a per-bounding box approach. For TCAV, concept sensitivity is computed for each bounding box by starting the backpropagation from the desired class neuron of the bounding box. To evaluate concept sensitivity for FNs, the list of raw unprocessed bounding boxes is compared with the desired object bounding boxes, and IoU is used to select the best raw bounding boxes that match the desired ones.

Concept Stability Metrics

The paper introduces a generalized concept stability metric SLkC(X)\mathcal{S}_{L_{k}^C(X)} for a concept CC in layer LkL_{k} applicable to a test set XX as

SLkC(X)≔separabilityLkC(X)×consistencyLkC\mathcal{S}_{L_{k}^C(X)} \coloneqq separability_{L_{k}^C(X)} \times consistency_{L_{k}^C}

where separabilityLkC(X)separability_{L_{k}^C(X)} represents how well-tested concepts are separated from each other in the feature space, and consistencyLkCconsistency_{L_{k}^C} denotes how similar the representations are for the same concept when obtained with different initialization conditions. Separability is quantified using the mean of relative F1-scores on XX for $\CAV_{L_k,i}^C$ of CC in layer LkL_k for NN runs ii with different initialization conditions for CAV training:

separabilityLkC(X)≔f1LkC(X)≔1N∑i=1Nf1(CAVLk,iC;X)∈[0,1]separability_{L_{k}^C(X)} \coloneqq f1_{L_{k}^C(X)} \coloneqq \frac{1}{N} \sum_{i=1}^N f1(CAV_{L_k,i}^C; X) \in [0,1]

Consistency is measured as the mean cosine similarity between the CAVs in layer LkL_k of NN runs:

$consistency_{L_{k}^C} \coloneqq \cos_{L_{k}^C} \coloneqq \tfrac{2}{N(N-1)}\sum_{i=1}^{N} \sum_{j=1}^{i-1} {\cos(\CAV_{L_k,i}^C, \CAV_{L_k,j}^C)}$

Concept attribution stability is evaluated by comparing the vanilla gradient approach against a stabilized version using SmoothGrad. The paper introduces the Concept Attribution Deviation (CAD) metric to qualitatively evaluate the difference between the concept attribution of SmoothGrad and the vanilla gradient in the tested layer:

$\text{CAD}(x) \coloneqq \frac{ \sum_{C}\sum_{i}^{N} \left |\attr_{C,i}^\text{grad}(x) - \attr_{C,i}^\text{SG}(x) \right |}{ \sum_{C}\sum_{i}^{N} \left |\attr_{C,i}^\text{grad}(x) \right |}$

where $\attr_{C,i}^\text{grad}(x)$ and $\attr_{C,i}^\text{SG}(x)$ are the attributions of concept CC in layer LkL_k for vanilla gradient and SmoothGrad, respectively, for a single prediction for sample xx.

CAV Dimensionality

The paper explores the impact of CAV dimensionality on stability, considering 1D, 2D, and 3D CAV representations (Figure 2). Figure 2

Figure 2: Concept activation vectors (CAVs) of different dimensions.

Experimental Design and Results

The proposed framework is used to conduct experiments for OD and classification models to evaluate concept representation stability via the selection of representation dimensionality and inspect the impact of gradient instability in CNNs on concept attribution. The MS COCO 2017 dataset is used for unsupervised concept mining in object detectors, while the BRODEN and CycleGAN Zebras datasets are used for classification model experiments. The paper selects three object detectors (YOLOv5s, FasterRCNN, and SSD) and three classification models (ResNet50, SqueezeNet1.1, and EfficientNet-B0) with various backbones to evaluate the stability of semantic representations.

Experiments demonstrate that 1D-CAVs achieve the best overall CAV stability due to their consistency and good concept separation. Increasing the number of training concept images has a positive impact on CAV stability, with a recommendation of at least 40 to 60 concept-related samples for training each CAV. The negative impact of gradient instability on concept analysis using TCAV is found to be minimal. SmoothGrad can have a higher impact on concept attribution values in shallow layers, where gradient instability is higher. The use of 1D-CAV representations generally results in lower CAD values than 3D-CAVs.

The impact of the number of concept samples on CAV stability for YOLO5, RCNN, SSD, ResNet, SqueezeNet, and EfficientNet is shown in (Figure 3), (Figure 4), (Figure 5), (Figure 6), (Figure 7), and (Figure 8), respectively. Figure 3

Figure 3: Impact of number of concept samples on CAVs stability for YOLO5

Figure 4

Figure 4: Impact of number of concept samples on CAVs stability for RCNN

Figure 5

Figure 5: Impact of number of concept samples on CAVs stability for SSD

Figure 6

Figure 6: Impact of number of concept samples on CAVs stability for ResNet

Figure 7

Figure 7: Impact of number of concept samples on CAVs stability for SqueezeNet

Figure 8

Figure 8: Impact of number of concept samples on CAVs stability for EfficientNet

Conclusion

This study provides a framework and metrics for evaluating the layer-wise stability of global vector representations in object detection and classification CNN models for explainability purposes. The concept retrieval stability metric jointly evaluates the consistency and separation in the feature space of concept semantic concept representations. Aggregated 1D-CAV representations offer the best performance. A minimum of 60 training samples per concept is necessary to ensure high stability in most cases. The concept attribution stability metric assesses the impact of gradient smoothing techniques on the stability of concept attribution. 1D-CAVs are more resistant to gradient instability, particularly in deep layers, and the paper recommends using gradient smoothing in shallow layers of deep network backbones. The work provides valuable quantitative insights into the robustness of concept representation, which can inform the selection of network layers and concept representations for CA in safety-critical applications.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.