Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives (1802.07623v2)

Published 21 Feb 2018 in cs.AI, cs.CV, and cs.LG

Abstract: In this paper we propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network. Given an input we find what should be %necessarily and minimally and sufficiently present (viz. important object pixels in an image) to justify its classification and analogously what should be minimally and necessarily \emph{absent} (viz. certain background pixels). We argue that such explanations are natural for humans and are used commonly in domains such as health care and criminology. What is minimally but critically \emph{absent} is an important part of an explanation, which to the best of our knowledge, has not been explicitly identified by current explanation methods that explain predictions of neural networks. We validate our approach on three real datasets obtained from diverse domains; namely, a handwritten digits dataset MNIST, a large procurement fraud dataset and a brain activity strength dataset. In all three cases, we witness the power of our approach in generating precise explanations that are also easy for human experts to understand and evaluate.

Citations (563)

View on Semantic Scholar

Summary

The paper introduces the Contrastive Explanations Method (CEM), which clarifies model decisions by identifying both minimally sufficient features and those whose absence preserves the classification.
It employs optimization techniques to extract pertinent positives from inputs and identifies pertinent negatives that, if present, would alter predictions.
Results on MNIST, fraud detection, and fMRI data demonstrate CEM’s effectiveness in providing concise, actionable, and human-interpretable insights.

Contrastive Explanations with Pertinent Negatives

The paper presents a method for generating contrastive explanations for classifications made by black-box models, such as deep neural networks. This approach, termed the Contrastive Explanations Method (CEM), enhances the interpretability of model predictions by identifying not only the features that are minimally sufficient to justify a classification (pertinent positives) but also those that must be absent (pertinent negatives) to maintain the classification distinction.

Methodology Overview

CEM tackles the task of uncovering features that influence model predictions through a careful analysis of inputs. The process involves:

Pertinent Positives: Identifying the minimal and sufficient set of features present in an input that supports its classification. This is achieved by solving an optimization problem that effectively distills the essential characteristics of the input necessary for the classifier's decision.
Pertinent Negatives: Recognizing the minimal set of absent features crucial for preserving the current class assignment. These are the features that, if added, would alter the classification to a different class, thereby offering a contrastive view of the decision boundary.

The approach does not require binarization of features, allowing it to operate on continuous data by leveraging transformations to identify meaningful signal and noise in the input space.

Implications and Applications

The utility of CEM is demonstrated through its application to multiple domains:

Handwritten Digits (MNIST): The method elucidates which pixel regions significantly influence digit classification while also identifying misleading cues that could perturb the classification.
Procurement Fraud Detection: In an enterprise dataset of invoices, CEM pinpoints features indicative of various risk levels, aiding human experts in assessing the validity and risk profile of each transaction.
Brain Functional Imaging (ABIDE I): Applied to resting-state fMRI data, CEM reveals atypical connectivity patterns in autistic brains, offering insights consistent with clinical neuroscience findings.

Contributions and Comparative Analysis

The paper contributes to the interpretability of machine learning models by addressing the lack of human-friendly contrastive explanations in existing methodologies like LIME and LRP. CEM emphasizes minimally sufficient and necessary conditions, providing explanations that align more closely with human reasoning.

The numerical results confirm the effectiveness of CEM in accurately maintaining or shifting predictions based on identified pertinent features, boasting a 100% alignment with classification decisions post-perturbation. This validation underscores the capacity of CEM to produce sparse, actionable explanations across diverse datasets.

Future Directions

The theoretical and practical implications of CEM pave the way for further investigations in AI interpretability. Future work could explore extending the methodology to more complex models and datasets, enhancing autoencoder integration for diverse data types, and evaluating the robustness of explanations in adversarial settings.

In summary, CEM offers a structured approach to deriving contrastive explanations by identifying both the presence and absence of critical features, thereby enriching the domain of interpretable machine learning with a method that resonates well with human interpretative processes. This work not only aids in understanding model decisions but also equips domain experts with concise, relevant explanations, improving trust and accountability in AI systems.

PDF Markdown