A Survey Of Methods For Explaining Black Box Models (1802.01933v3)

Published 6 Feb 2018 in cs.CY, cs.AI, and cs.LG

Abstract: In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.

Citations (3,636)

View on Semantic Scholar

Summary

The paper categorizes explanation methods into four types: global mimicking, local outcome, model inspection, and inherently transparent designs.
It reviews a variety of techniques such as decision trees, feature importance, saliency masks, sensitivity analysis, and prototype selection to interpret opaque models.
The survey emphasizes the need for standardized metrics and further research to improve the transparency and accountability of machine learning systems.

A Survey Of Methods For Explaining Black Box Models

The paper "A Survey Of Methods For Explaining Black Box Models" by Riccardo Guidotti et al. addresses the critical issue of interpretable machine learning, summarizing myriad approaches for explaining opaque decision-support systems. This essay presents an expert overview of the key findings, methodologies, and implications emerging from the survey.

Overview and Motivation

The core problem tackled is the lack of transparency in black box models—complex algorithms that, while often highly accurate, do not naturally offer insight into their internal decision-making processes. This lack of transparency presents significant ethical and practical challenges when these models are deployed in sensitive domains such as finance, healthcare, and criminal justice. The General Data Protection Regulation (GDPR) further underscores the necessity of interpretability by stipulating, to some extent, a “right to explanation.” In response, this paper categorizes efforts in the field to demystify these models, providing researchers with a classification schema to aid in selecting appropriate techniques for their explanatory needs.

Problem Classification

The classification schema proposed in the paper dissects the problem of black box models into four types:

Black Box Model Explanation: Generates a global interpretable model that mimics the black box's behavior.
Black Box Outcome Explanation: Provides local interpretability by explaining individual decisions made by the black box.
Black Box Inspection: Focuses on visually understanding the inner workings or outputs of the black box.
Transparent Box Design: Involves designing inherently interpretable models that do not sacrifice accuracy.

Explanatory Techniques

The methodologies reviewed are organized based on the type of explanation they generate:

Decision Trees and Rule Extraction

Numerous methods approximate black boxes using decision trees or sets of rules due to their inherent interpretability. For instance, Craven’s Trepan algorithm generates decision trees to explain neural networks, while Deng's inTrees framework simplifies tree ensembles into comprehensible rules. These methods predominantly work with tabular data and are adept at transforming complex models into transparent ones.

Feature Importance and Saliency Masks

For models such as support vector machines (SVMs) and deep learning architectures, techniques like feature importance and saliency masks provide insights into the contributions of different features. LIME (Local Interpretable Model-agnostic Explanations) is a notable agnostic approach that perturbs input data to understand local behavior across different model types. For deep neural networks, saliency masks highlight regions in data (such as image pixels) that most influence predictions, enhancing the interpretability of complex visual and textual data.

Sensitivity Analysis and Partial Dependence Plots

Sensitivity analysis and partial dependence plots (PDPs) serve as inspection tools to measure the impact of input changes on predictions. These tools help interpret how variations in input features affect the model’s decision-making, thus offering insights into model behavior across different feature sets. The Orthogonal Projection of Input Attributes (OPIA) approach iteratively manipulates inputs to assess the dependence of predictions on different features.

Prototypes and Neuron Activation

Prototype selection methods, such as the Bayesian Case Model (BCM), provide interpretable instances representing broader data patterns. For deep learning models, visualizing neuron activations helps understand the hierarchical feature learning process within networks, revealing latent patterns and decision criteria.

Implications and Future Work

The survey highlights a need for standardizing what constitutes an explanation and for developing metrics to quantify interpretability and comprehensibility. A clear, formal definition of explanations, akin to principles found in formal logic or software verification, would significantly bolster the usability of interpretable models. Furthermore, the survey reveals gaps in addressing latent feature utilization and in developing transparent recommender systems, suggesting fertile ground for future research.

The implications of this work are multi-faceted, impacting not only the scientific community but also regulatory bodies and industries adopting machine learning solutions. By advancing the transparency of algorithmic decisions, researchers and practitioners can engender greater trust and accountability in AI systems.

In summary, Guidotti et al.’s survey meticulously categorizes and evaluates the current landscape of black box explanation methods, offering a foundational reference for ongoing research and development in interpretable machine learning.

PDF Markdown