A Survey of the State of Explainable AI for Natural Language Processing (2010.00711v1)

Published 1 Oct 2020 in cs.CL, cs.AI, and cs.LG

Abstract: Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of NLP. We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area.

PDF Abstract

An Overview of Explainable AI in Natural Language Processing

The paper "A Survey of the State of Explainable AI for Natural Language Processing" by Marina Danilevsky et al. provides a comprehensive review of the current developments in Explainable AI (XAI) within the field of NLP. The emergence of high-performing, albeit black-box, models such as deep neural networks has elevated the importance of XAI due to the necessity for models whose reasoning processes can be understood by end-users.

Explainability Challenges in NLP

Historically, NLP models utilized white-box techniques, including decision trees and rule-based systems, which inherently provided interpretability. The advent of deep learning models, while boosting performance, has obscured the processes by which results are derived, posing challenges for model trust and accountability. This survey aims to collate works from the last seven years that address the explainability of these complex models, categorized by whether explanations are local (explanation per instance) or global (explanation of the model as a whole), and whether they are self-explaining or require post-hoc processes.

Techniques for Generating Explanations

The paper identifies several prevalent techniques for explanation generation in NLP models:

Feature Importance: Focuses on determining the significance of input features in the prediction process, often leveraging attention mechanisms and first-derivative saliency. This approach is intuitive in NLP where language components are inherently interpretable.
Surrogate Models: Utilizes simplified models, such as linear approximations in Local Interpretable Model-agnostic Explanations (LIME), to mimic and thus explain a more complex model's predictions. Despite its utility, fidelity issues can arise if the surrogate does not accurately capture the original model's decision boundaries.
Example-Driven Explanations: Relates model predictions to similar instances from the training data, akin to nearest neighbor methods, providing grounding for predictions through analogous examples.
Provenance-Based and Declarative Induction: These methodologies leverage rule induction or traceable reasoning steps, lending interpretability through logical sequences that guide predictions.

Notably, the survey highlights that feature importance and surrogate models are the most utilized techniques due to their adaptability across various NLP models and tasks.

Visualization of Explanations

Visualizing the explanations is crucial for user comprehension. The common methods include saliency maps for heatmap-style visualizations of feature importance, direct presentation of induced rule-based explanations, and natural language generation to describe model reasoning in human-understandable terms. These methodologies enhance user trust by elucidating the prediction pathways.

Evaluation of Explanation Quality

The paper observes a lack of standardized metrics for evaluating the quality and effectiveness of explanations. Three primary evaluation categories are identified:

Informal Examination: Visual or qualitative assessment of explanations against human intuition.
Comparison to Ground Truth: Involves comparing explanations to a predefined ground truth, though this assumes a single correct explanation exists, which may not always be the case.
Human Evaluation: Engages human judges to rate the understandability and relevance of the explanations, providing subjective insights into their utility.

Furthermore, the survey points out that most works focus on local explanations, with global explanations being less prevalent, possibly because black-box models inherently require explanations of individual decisions over comprehensive model logic.

Implications and Future Directions

This survey underscores gaps and future directions in XAI for NLP, emphasizing the need for clear benchmarks and evaluation protocols, improved fidelity in surrogate explanations, and more comprehensive global explanation frameworks. It also highlights that explainability might not always complement model accuracy, suggesting a trade-off that requires careful consideration depending on application context.

In conclusion, the paper serves as a valuable resource for NLP researchers by organizing and evaluating existing XAI methods and postulating critical directions for advancing the interpretability of models in complex AI systems. The work provides a foundation upon which further investigations into the nuanced aspects of model transparency can be constructed, fostering advancements that align model development with end-user needs for trust and accountability.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Marina Danilevsky (15 papers)
Kun Qian (87 papers)
Ranit Aharonov (20 papers)
Yannis Katsis (13 papers)
Ban Kawas (3 papers)
Prithviraj Sen (16 papers)

Citations (336)

View on Semantic Scholar