Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? (2005.01831v1)

Published 4 May 2020 in cs.CL, cs.AI, and cs.LG

Abstract: Algorithmic approaches to interpreting machine learning models have proliferated in recent years. We carry out human subject tests that are the first of their kind to isolate the effect of algorithmic explanations on a key aspect of model interpretability, simulatability, while avoiding important confounding experimental factors. A model is simulatable when a person can predict its behavior on new inputs. Through two kinds of simulation tests involving text and tabular data, we evaluate five explanations methods: (1) LIME, (2) Anchor, (3) Decision Boundary, (4) a Prototype model, and (5) a Composite approach that combines explanations from each method. Clear evidence of method effectiveness is found in very few cases: LIME improves simulatability in tabular classification, and our Prototype method is effective in counterfactual simulation tests. We also collect subjective ratings of explanations, but we do not find that ratings are predictive of how helpful explanations are. Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability across a variety of explanation methods and data domains. We show that (1) we need to be careful about the metrics we use to evaluate explanation methods, and (2) there is significant room for improvement in current methods. All our supporting code, data, and models are publicly available at: https://github.com/peterbhase/InterpretableNLP-ACL2020

PDF Abstract

Evaluating Explainable AI: Analysis of Algorithmic Explanations in Predicting Model Behavior

The paper by Hase and Bansal addresses a critical aspect of machine learning: interpretability, specifically through a concept known as simulatability. Simulatability entails a user's ability to predict a model's behavior on new inputs based on explanations provided by various algorithmic methods. The authors focus on the field of Explainable AI (XAI), evaluating how different explanation techniques impact users' understanding of model decisions across text and tabular data domains.

The paper investigates five explanation methods: LIME, Anchor, Decision Boundary, a Prototype model, and a novel Composite approach combining insights from all four. The authors employ two types of simulation tests, forward and counterfactual, designed to scrutinize how these explanations aid in simulating model behavior. Unlike many existing evaluations of interpretability, their work isolates the effects of explanations by controlling key experimental factors, such as separating test instances from explained instances and benchmarking against unexplained examples.

The results from extensive human user tests, encompassing over 2100 responses, offer a nuanced evaluation of explanation mechanisms. A particular highlight is LIME's effectiveness in improving simulatability within tabular data tasks, evidenced by statistically significant gains over baseline performance. Interestingly, the Prototype method shows merit in counterfactual simulation tests across both data domains. However, the paper finds no definitive improvement in simulatability for text-related tasks with any single method, pointing towards a broader challenge in XAI: efficacy varies significantly across data types and contexts.

Crucially, the paper reveals that users' subjective ratings of explanation quality are not reliable predictors of actual utility in enhancing simulatability. This insight challenges the common reliance on subjective evaluations within the field, suggesting that objective measures such as simulatability should take precedence in assessing explanation methods.

This paper's implications extend beyond immediate numerical results. The research underscores the necessity for more sophisticated approaches to interpretability. Current method limitations accentuate the need for developing explanations that efficiently delineate necessary and sufficient conditions for model predictions. Furthermore, the composite explanation approach, while intuitively promising, does not show marked improvement, indicating potential avenues for further research in synthesizing complementary explanatory information.

Given these findings, the authors advocate for refinement in both how explanations are formulated and the metrics used for their evaluation. As the demand for interpretable models grows in critical sectors like healthcare and finance, improving AI's transparency and users' trust will remain paramount. This paper contributes importantly to understanding how explanations can serve these objectives, although it also clearly identifies the significant room for progress.

In conclusion, the work by Hase and Bansal provides a valuable framework for rigorously evaluating the effectiveness of explanation methods in AI. This pioneering approach in isolating the impacts of explanations offers a path forward in the quest for truly interpretable AI systems, while also acknowledging the complexities that remain in achieving this goal across diverse applications and data environments. Future research should continue to explore these dimensions to enhance the interpretability of AI in a manner that is both meaningful and actionable.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Peter Hase (29 papers)
Mohit Bansal (304 papers)

Citations (272)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - peterbhase/InterpretableNLP-ACL2020: Code for "Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?" (44 stars)