Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Semantic Interpretability: Revisiting Concept Activation Vectors (2104.02768v1)

Published 6 Apr 2021 in stat.ML, cs.AI, cs.CV, and cs.LG

Abstract: Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole. RCAV calculates a concept gradient and takes a gradient ascent step to assess model sensitivity to the given concept. By generalizing previous work on concept activation vectors to account for model non-linearity, and by introducing stricter hypothesis testing, we show that RCAV yields interpretations which are both more accurate at the image level and robust at the dataset level. RCAV, like saliency methods, supports the interpretation of individual predictions. To evaluate the practical use of interpretability methods as debugging tools, and the scientific use of interpretability methods for identifying inductive biases (e.g. texture over shape), we construct two datasets and accompanying metrics for realistic benchmarking of semantic interpretability methods. Our benchmarks expose the importance of counterfactual augmentation and negative controls for quantifying the practical usability of interpretability methods.

This paper, "Robust Semantic Interpretability: Revisiting Concept Activation Vectors" (Pfau et al., 2021 ), addresses the need for robust and accurate semantic interpretability methods for deep learning models, particularly Convolutional Neural Networks (CNNs) used in image classification. Existing interpretability methods, such as saliency maps, primarily focus on pixel-level attributions and often fail to explain model behavior in terms of human-understandable semantic concepts like texture, color, or object properties. The paper highlights the importance of semantic interpretability for debugging models during development (e.g., identifying reliance on spurious correlations or confounding artifacts) and for ensuring trustworthiness during deployment (e.g., understanding why a medical imaging model makes a specific prediction).

The authors propose Robust Concept Activation Vectors (RCAV) as an improved method building upon the concept activation vectors (CAV) introduced by Testing with CAVs (TCAV) [Kim2017-yi]. RCAV aims to quantify the effect of semantic concepts on model predictions at both the individual image level and the aggregate dataset level.

Key Concepts and Method (RCAV):

The core idea of RCAV, similar to TCAV, is to represent a semantic concept within the model's latent space. This is done by training a linear classifier (specifically, a logistic regression) on the activations of an intermediate layer (flf_l) of the target model. The training data consists of images representing the concept (concept set CiC_i) and a set of negative control images (union of other concepts jiCj\bigcup\limits_{j\neq i}{C_j}). The weight vector of this trained logistic regression serves as the Concept Activation Vector (VC,iV_{C,i}), which linearly approximates the model's encoding of concept CiC_i.

RCAV introduces a new method for quantifying concept sensitivity. Instead of using the gradient's linear approximation as TCAV does, RCAV quantifies sensitivity by performing a direct perturbation in the latent space and observing the change in the model's output score. For a given input xx, concept CiC_i, layer ll, and target class kk, the image-level concept sensitivity score sC,i,k(x)s_{C,i,k}(x) is calculated as:

sC,i,k(x)=fl+,k(fl(x)+αVC,i/VC,i)fl+,k(fl(x))s_{C,i,k}(x) = f_l^{+,k}(f_l(x)+\alpha V_{C,i}/ \lVert V_{C,i} \rVert)- f_l^{+,k}(f_l(x))

where flf_l is the model up to layer ll, fl+,kf_l^{+,k} is the part of the model from layer ll to the softmax output for class kk, α\alpha is a step size hyperparameter, and VC,i/VC,iV_{C,i}/\lVert V_{C,i} \rVert is the normalized CAV. This approach accounts for the non-linearity in the later layers of the model (fl+,kf_l^{+,k}), which the authors demonstrate is crucial for accurate image-level sensitivity measurement.

For dataset-level analysis, RCAV aggregates the image-level scores to compute a dataset-wide concept sensitivity score SC,i,kS_{C,i,k}:

$S_{C,i,k} = -0.5 + \frac{1}{\lvert X_{val}^k \rvert} \sum \limits_{x \in X_{val}^k}{\mathds{1}(s_{C,i,k}(x)\geq 0)}$

This score indicates whether the concept CiC_i tends to positively or negatively influence predictions for class kk across the validation set XvalkX_{val}^k, with 0 indicating irrelevance and positive values indicating a positive association.

Robustness and Hypothesis Testing:

A significant contribution of RCAV is its improved hypothesis testing procedure to distinguish meaningful concept sensitivities from random noise. The authors observe that even random vectors can yield non-zero sensitivity scores. Unlike TCAV which used a t-test (and is shown to have a high false positive rate), RCAV proposes a permutation test. This involves generating a null distribution of concept sensitivity scores by permuting the labels of the concept samples used to train the CAVs. The p-value for an observed SC,i,kS_{C,i,k} is then calculated based on its rank within this null distribution.

$p = \frac{1}{\lvert N \rvert} \sum \limits_{n \in N} {\mathds{1}(\lvert S_{C,n} \rvert \geq \lvert S_{C,i,k} \rvert)}$

where NN is the set of null vectors generated by permutations. A Bonferroni correction is applied for multiple testing across layers or concepts. This permutation testing drastically reduces the false positive rate compared to TCAV.

Benchmarking Datasets and Metrics:

To provide a realistic evaluation framework for semantic interpretability methods, the paper introduces two novel benchmark datasets and relevant metrics:

  1. Textured Fashion MNIST (TFMNIST): Built on Fashion MNIST, this dataset replaces clothing item surfaces with textures (spiral, zigzag, dot, stripe). A training set biases the model to associate certain textures with specific classes (e.g., spiral with T-shirts). By interpolating between textures in the validation set images, a ground truth for concept sensitivity is established.
  2. Biased CAMELYON16: Based on a medical imaging dataset, this benchmark simulates confounding artifacts by augmenting image contrast levels (e.g., increased contrast for cancerous tissue). Ground truth sensitivity is measured by the change in model prediction before and after a small, nearly imperceptible contrast alteration.
  3. ImageNet: Used primarily to quantify false positive rates by testing concept sensitivity on intuitively unrelated concept/class pairs (e.g., texture for "great white shark", color for "apron").

Evaluation metrics align with the practical use cases:

  • Image-level: Correlation (PτP_\tau, Kendall's τ\tau) between RCAV's predicted sensitivity sC,i,ks_{C,i,k} and the ground truth sensitivity sC,i,k~\widetilde{s_{C,i,k}} (measured by counterfactual augmentation). Also, AUROC and AUPRC when binarizing sensitivity based on a threshold.
  • Dataset-level: False positive rate on negative control classes, and consistency of sign across layers.

Results and Implementation Considerations:

The experiments show that RCAV significantly outperforms TCAV on the proposed benchmarks:

  • Image-level: RCAV's sC,i,ks_{C,i,k} shows significant positive correlation with ground truth sensitivity on both TFMNIST (τ=0.31\tau=0.31) and Biased CAMELYON16 (τ=0.85\tau=0.85), while TCAV's correlation is negligible or negative. RCAV achieves much higher AUROC and AUPRC scores (\autoref{table:metrics}).
  • Dataset-level & Robustness: RCAV achieves a 0% false positive rate on negative controls using the permutation test, compared to TCAV's 100% rate using the t-test (\autoref{table:fpr}). An ablation paper confirms that both the non-linear sensitivity score calculation and the permutation test contribute to this improvement.
  • Runtime: The permutation test, while more robust, can be computationally intensive. The authors note an optimization to stop early, achieving up to a 250x speed-up in best-case scenarios.
  • Hyperparameters: The paper analyzes the sensitivity of RCAV performance to layer choice and step size (α\alpha). Performance is generally robust to step size within a reasonable range. Layer choice can influence performance, particularly when concepts are not linearly encoded across all layers, which is identified as a limitation.

Implementation Details:

  • RCAV works with existing CNN architectures without modification.
  • Concept sets are typically 100-300 images per concept, drawn from the validation set or an auxiliary dataset.
  • Training the logistic regression for CAVs requires extracting intermediate layer activations.
  • Calculating image-level sensitivity involves forward passes through the latter part of the model (fl+f_l^+) with and without the CAV perturbation. This requires access to the model's architecture and weights.
  • The permutation test requires repeating the CAV training and sensitivity calculation process multiple times (e.g., 500 permutations) for each concept/class/layer combination being tested.
  • The code for RCAV and the benchmark datasets is provided on GitHub (https://github.com/keiserlab/rcav).

Limitations:

  • RCAV relies on the CAV's linear approximation of the concept encoding in the intermediate layer. If a concept is encoded non-linearly even within the layer's activation space, CAV-based methods may struggle, as suggested by the SVD analysis in the appendix on the TFMNIST dataset (\autoref{table:svd}).
  • The choice of layer can affect results, though the sign of dataset-level sensitivity is often consistent (\autoref{fig:imagenet}, \autoref{table:layer}). Recommending multiple layers be tested is a practical approach.
  • Defining meaningful concept sets requires user expertise and care, especially in maintaining class balance within concept samples.

Broader Impact:

The authors emphasize RCAV's potential for increasing trust and transparency in AI systems. It can be used to debug models (e.g., identify unwanted biases like reliance on protected attributes or confounding artifacts), ensure fairness, and potentially aid scientific discovery by quantifying the importance of domain-specific concepts for model predictions (e.g., molecular properties). The introduced benchmarks and metrics are intended to promote reproducible and relevant evaluation of semantic interpretability methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jacob Pfau (10 papers)
  2. Albert T. Young (2 papers)
  3. Jerome Wei (1 paper)
  4. Maria L. Wei (3 papers)
  5. Michael J. Keiser (7 papers)
Citations (14)