Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks (1910.01279v2)

Published 3 Oct 2019 in cs.CV

Abstract: Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions. In this paper, we develop a novel post-hoc visual explanation method called Score-CAM based on class activation mapping. Unlike previous class activation mapping based approaches, Score-CAM gets rid of the dependence on gradients by obtaining the weight of each activation map through its forward passing score on target class, the final result is obtained by a linear combination of weights and activation maps. We demonstrate that Score-CAM achieves better visual performance and fairness for interpreting the decision making process. Our approach outperforms previous methods on both recognition and localization tasks, it also passes the sanity check. We also indicate its application as debugging tools. Official code has been released.

Authors (8)

Haofan Wang (32 papers)
Zifan Wang (75 papers)
Mengnan Du (90 papers)
Fan Yang (878 papers)
Zijian Zhang (125 papers)
Sirui Ding (14 papers)
Piotr Mardziel (18 papers)
Xia Hu (186 papers)

Citations (943)

View on Semantic Scholar

Summary

The paper introduces Score-CAM, a novel gradient-free method that derives activation map importance from target scores to enhance model interpretability.
It employs a data-driven approach to overcome the limitations of gradient-based CAMs, producing sharper and more discriminative saliency maps.
Quantitative evaluations on ImageNet reveal superior performance in Average Drop, Average Increase, and localization metrics, proving its efficacy for model debugging.

Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks

The paper discusses a novel post-hoc visual explanation method named Score-CAM, designed to enhance the interpretability of Convolutional Neural Networks (CNNs). It builds upon the Class Activation Mapping (CAM) family of methods, offering improvements by eliminating the dependence on gradient information, a core component of previous CAM-based approaches.

Introduction and Motivation

Score-CAM addresses a pivotal facet of machine learning research: understanding the internal mechanisms behind the decisions made by deep neural networks. Specifically, it targets CNNs due to their widespread application in image and language processing. Traditional class activation mapping methods, such as CAM, Grad-CAM, and Grad-CAM++, rely on gradients to generate visual explanations. However, these methods have inherent limitations, including the noisy nature of gradient maps and issues related to gradient saturation. Score-CAM seeks to overcome these drawbacks by using a data-driven approach to derive the weights of activation maps through forward-passing scores on the target class.

Key Contributions

The authors outline several significant contributions:

Introduction of Score-CAM: A novel gradient-free method that determines the importance of activation maps based on their global contribution to the target class prediction, rather than local sensitivity.
Quantitative Evaluation: The paper provides comprehensive experimental results demonstrating Score-CAM’s superiority in generating fairer and more interpretable saliency maps on recognition tasks.
Visualization and Localization: The authors present qualitative comparisons showing improved performance of Score-CAM in terms of visualization and localization capabilities.
Practical Applications: Demonstrates the utility of Score-CAM as a debugging tool to analyze model misbehavior.

Methodology

In contrast to gradient-based methods, which use backpropagation to calculate the derivative of the target class score concerning input images, Score-CAM assesses the importance of each activation map by directly observing the change in the model’s prediction when the input image is partially masked by activation maps. This results in weights, termed Increase of Confidence (IoC), that accurately reflect the contribution of specific regions in the input image to the network's output.

Algorithm and Implementation

The Score-CAM algorithm proceeds through several distinct steps:

Extraction of activation maps from a selected convolutional layer.
Upsampling and normalizing these activation maps to the input image size, creating partial masks.
Measuring the model’s prediction score with these masked inputs to derive the IoC, which serves as the weight for each activation map.
Generating the final visual explanation by combining these weighted activation maps.

This methodology mitigates issues found in previous CAM variants and ensures the resulting saliency maps are both visually clear and class-discriminative.

Experimental Results

Qualitative Evaluation

The authors provide qualitative evidence showcasing Score-CAM’s ability to produce more focused and less noisy saliency maps compared to other state-of-the-art methods such as Grad-CAM and RISE. The visualizations illustrate Score-CAM’s efficacy in clearly capturing relevant features and objects within the images.

Quantitative Evaluation

The method was rigorously tested on the ImageNet validation set. Score-CAM outperformed its predecessors in both Average Drop and Average Increase metrics, demonstrating its enhanced capability in identifying crucial regions correlated with the model's predictions. Additionally, Score-CAM exhibited superior performance in localization tasks, with over 60% of the salience map energy falling within the correct bounding box on the evaluated dataset.

Sanity Check

To ensure the robustness of the visual explanations, the authors conducted a sanity check by randomizing the network parameters. Score-CAM successfully passed this test, aligning its results with the expected behavior of model-sensitive perturbations.

Implications and Future Work

Score-CAM presents notable advancements in the field of visual explanations for CNNs. By eliminating gradient dependence, it provides more reliable and interpretable saliency maps, offering both theoretical and practical benefits. The robust quantitative and qualitative improvements imply that Score-CAM could serve as a meaningful tool for enhancing the transparency of deep learning models.

Potential avenues for future research include exploring Score-CAM’s applicability to other types of neural networks beyond CNNs, further refining the method to reduce computational costs, and integrating Score-CAM into real-world systems for interpretability and model debugging. Additionally, expanding the evaluation to more complex tasks and diverse datasets could provide deeper insights into the method's generalizability and robustness.

In conclusion, Score-CAM emerges as a promising addition to the array of post-hoc visual explanation methods, significantly enhancing model interpretability and offering practical tools for in-depth analysis of neural networks.