- The paper introduces a recursive method to decompose neural network activations into interpretable concepts, effectively mitigating neural collapse.
- The paper leverages Sobol indices to quantitatively assess concept importance, offering a robust alternative to noisy directional derivative methods.
- The paper employs implicit differentiation to produce detailed pixel-level attribution maps, bridging global concept explanations with local visual evidence.
The CRAFT (Concept Recursive Activation FacTorization) method, detailed in (2211.10154), introduces a novel framework for generating concept-based explanations for deep neural networks. It addresses the limitations of traditional attribution methods that primarily focus on identifying "where" the model attends in an image without elucidating "what" the model perceives. CRAFT aims to bridge this gap by identifying both "what" and "where" through the extraction and attribution of human-interpretable concepts within the neural network's activation space.
Key Components of CRAFT
CRAFT comprises three primary components that contribute to its enhanced explainability:
Recursive Concept Detection
This component employs a recursive strategy to detect and decompose concepts across different layers of the neural network. The process begins by extracting concepts from the top layers. If a concept lacks clear interpretability, the method recursively decomposes it into sub-concepts utilizing activations from earlier layers. This recursive decomposition mitigates the issue of "neural collapse," where concepts tend to be amalgamated in deeper layers, thereby reducing their interpretability.
Sobol Indices for Concept Importance
CRAFT leverages Sobol indices, derived from sensitivity analysis, to estimate the importance of individual concepts in relation to a model's prediction. Unlike previous methods such as TCAV, which rely on potentially noisy directional derivatives, Sobol indices offer a quantitative measure of each concept's contribution, including its interactions with other concepts, to the model's output variance. This approach provides a more faithful assessment of concept importance and reduces confirmation bias.
Implicit Differentiation for Concept Attribution Maps
CRAFT generates concept attribution maps by backpropagating concept scores into the pixel space. It employs the implicit function theorem to enable differentiation through the Non-negative Matrix Factorization (NMF) block utilized for concept discovery. This allows for the localization of pixels associated with a particular concept within a given input image and enables the creation of concept-wise attribution maps using both white-box and black-box attribution methods.
Improved Explainability
CRAFT enhances explainability through the following mechanisms:
- Granularity of Explanations: By recursively identifying and decomposing concepts, CRAFT offers explanations at appropriate levels of granularity, making them more accessible to human understanding.
- Concept Importance: Sobol indices ensure that the identified concepts are pertinent to the model's decision-making process, mitigating confirmation bias.
- Comprehensive Understanding: Concept attribution maps bridge the divide between global concept explanations and local pixel-level explanations, providing a holistic understanding of "what" the model observed and "where" it observed it.
Experimental Evaluation and Results
The authors conducted a series of human and computer vision experiments to validate the efficacy of CRAFT.
Human Experiments (Utility Evaluation)
In the utility evaluation, a human-centered utility benchmark was employed to assess the practical usefulness of CRAFT in real-world scenarios. Participants were trained to predict a model's decisions on unseen images using explanations generated by CRAFT, ACE, and various attribution methods. The benchmark encompassed scenarios such as identifying bias in an AI system (Husky vs Wolf), characterizing visual strategies (Paleobotanical dataset), and understanding complex failure cases (ImageNet "Red fox" vs "Kit fox").
The utility metric quantified the accuracy of users in predicting the model's decision on novel images, normalized by the baseline accuracy of users trained without explanations. Higher utility scores indicated more useful explanations. CRAFT achieved higher utility scores than attribution methods and ACE in the Husky vs. Wolf and Leaves scenarios, demonstrating its benefits for human understanding.
Human Experiments (Validation of Recursivity)
Psychophysics experiments were conducted to validate the recursivity ingredient and the meaningfulness of the extracted high-level concepts.
In an intruder detection experiment, users were tasked with identifying an "intruder" image crop from a different concept among a series of image crops. The experiment compared the results of intruder detection using a concept and using one of its sub-concepts. In a binary choice experiment, users were presented with an image crop belonging to both a subcluster and a parent cluster and asked which of the two clusters seemed to accommodate the image best.
The results indicated that both concepts and sub-concepts are coherent and that recursivity can improve the understanding of the generated concepts. Participants more frequently chose the sub-concept cluster, suggesting that recursivity helps form more coherent clusters.
Computer Vision Experiments
Fidelity analysis was conducted using deletion and insertion curves to evaluate the faithfulness of the identified concepts and the concept importance estimator. The metrics measured the change in logit score when adding/removing concepts considered important by Sobol indices vs TCAV scores. Sobol indices led to better estimates of important concepts compared to TCAV.
A sanity check was performed on the method by running the concept extraction pipeline on a ResNet-50v2 model with randomized weights. The concepts extracted were drastically different from those extracted from the trained model, indicating that CRAFT passes the sanity check.