Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models (1708.08296v1)

Published 28 Aug 2017 in cs.AI, cs.CY, cs.NE, and stat.ML

Abstract: With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.

PDF Abstract

Explainable Artificial Intelligence: Understanding, Visualizing, and Interpreting Deep Learning Models

The paper "Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models," authored by Wojciech Samek, Thomas Wiegand, and Klaus-Robert Müller, addresses the critical challenge of interpretability in deep learning models. As AI systems increasingly achieve superhuman performance across various complex tasks, the opacity of their decision-making processes becomes a significant drawback, particularly in domains where transparency is essential for validation and trust.

Introduction

The paper highlights the tremendous strides made in artificial intelligence and machine learning, driven by advancements in support vector machines and deep learning methodologies. Coupled with the availability of large datasets and powerful computational resources, AI systems now excel in image recognition, natural language processing, and strategic game playing. Yet, despite their impressive performance, these models operate as black boxes, providing no insight into the reasoning behind their predictions. This lack of transparency is especially problematic in applications requiring high-stakes decisions, such as medical diagnostics and autonomous driving.

Necessity of Explainable AI

The authors elucidate four primary reasons for the urgent need for explainable AI:

Verification of the System: In critical domains like healthcare, it is imperative to validate predictions made by AI models. The paper cites examples where AI systems have made erroneous conclusions due to biased training data, underscoring the need for models interpretable by human experts.
Improvement of the System: Understanding a model's weaknesses is the first step toward enhancing its performance. Interpretability aids in detecting biases and comparing different models or architectures, as highlighted by comparative studies showing variance in feature importance despite equivalent performance metrics.
Learning from the System: AI systems trained on vast datasets can discern patterns indiscernible to humans. Explainable AI facilitates knowledge transfer from the model to humans, enriching fields such as the natural sciences where understanding underlying phenomena is more valuable than mere prediction accuracy.
Compliance with Legislation: Increasing regulation around AI, including the European Union's "right to explanation," mandates that AI decisions impacting individuals be understandable.

Methods for Explaining Predictions

The paper introduces two methods for explaining AI predictions: Sensitivity Analysis (SA) and Layer-wise Relevance Propagation (LRP).

Sensitivity Analysis (SA)

SA measures the importance of input features by evaluating the model's local gradient. Relevance of an input variable is quantified as the magnitude of its partial derivative concerning the model output. Although straightforward, the paper notes that SA often provides noisy and sometimes misleading relevance maps, as it focuses on the sensitivity rather than directly on the importance of features for a specific prediction.

Layer-wise Relevance Propagation (LRP)

LRP offers a more refined approach by redistributing the prediction score back through the layers of the network to the input features, adhering to a relevance conservation principle. This method ensures that every step in the redistribution process conserves the total relevance, thereby providing a clear attribution of the prediction to input features. LRP assigns relevance scores that match human intuition more closely than SA.

Evaluation of Explanation Quality

The authors propose using perturbation analysis to objectively measure the quality of explanations. This involves systematically perturbing input features according to their relevance scores and observing the decline in prediction accuracy. A steeper decline indicates a more accurate identification of truly relevant features. LRP consistently demonstrates superior performance over SA in these evaluations.

Experimental Results

The paper presents a comprehensive evaluation of SA and LRP across three tasks: image classification, text document classification, and human action recognition in videos.

Image Classification: LRP heatmaps align with human intuition, pinpointing salient features in images, whereas SA heatmaps are noisier. Perturbation analysis confirms LRP's superior performance.
Text Document Classification: LRP distinguishes between positive and negative evidence, providing more nuanced explanations than SA. Perturbation analysis again demonstrates LRP's higher quality explanations.
Human Action Recognition: LRP identifies relevant spatiotemporal features in video frames, highlighting actions critical for classification, which is confirmed through perturbation analysis.

Conclusion

The paper makes a compelling case for the necessity of explainable AI, providing robust methods and evaluation frameworks for enhancing model transparency. As AI systems continue to integrate into critical areas of society, the ability to interpret their decisions becomes paramount for validation, improvement, and compliance with regulatory frameworks. Future work will focus on theoretical underpinnings and applications of explainability, aiming to bridge the gap between AI and human understanding.

The paper underscores the importance of developing AI systems that are not only performant but also interpretable, paving the way for broader and safer adoption of AI technologies.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Wojciech Samek (144 papers)
Thomas Wiegand (29 papers)
Klaus-Robert Müller (167 papers)

Citations (1,109)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos