The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective (2202.01602v4)

Published 3 Feb 2022 in cs.LG and cs.AI

Abstract: As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and six different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that (1) state-of-the-art explanation methods often disagree in terms of the explanations they output, and (2) machine learning practitioners often employ ad hoc heuristics when resolving such disagreements. These findings suggest that practitioners may be relying on misleading explanations when making consequential decisions. They also underscore the importance of developing principled frameworks for effectively evaluating and comparing explanations output by various explanation techniques.

PDF Abstract

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

The paper "The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective" addresses a critical issue in the domain of explainable AI, specifically the reliability and consistency of post hoc explanation methods. As machine learning models become increasingly integral to decision-making in high-stakes environments such as healthcare, finance, and law, the need for interpretable and transparent AI models becomes paramount. However, the paper identifies a significant problem: different explanation methods often provide conflicting insights about the same model prediction, which can lead practitioners to make conflicting decisions based on potentially misleading information.

Methodological Contributions

The authors embark on a rigorous analysis to formalize and evaluate the disagreement problem intrinsic to explainable machine learning. They initiate the process by conducting interviews with 25 data scientists, probing them for their insights on what constitutes disagreement between model explanations. This qualitative input is then used to establish a framework for quantifying disagreement. Key concerns identified by practitioners include discrepancies in the top-k important features provided by different methods, variations in feature ranking and directionality, and inconsistencies in selected or relative feature importances.

To address and quantify these concerns, the authors propose several metrics: Feature Agreement, Rank Agreement, Sign Agreement, Signed Rank Agreement, Rank Correlation, and Pairwise Rank Agreement. These metrics are designed to cover various aspects of explanation disagreement, such as discrepancies in top-k features, feature ranking and directionality, and relative importance of specific features.

Empirical Findings

The researchers applied their framework across a diverse set of four real-world datasets and utilized eight predictive models alongside six prevalent explanation methods, including LIME, SHAP, and a range of gradient-based methods (e.g., Integrated Gradients, SmoothGrad). Their findings reveal substantial variations and disagreements among these methods:

Tabular Data: Significant disagreement was observed across different models, with rank correlation between methods showing broad variability, often including negative correlations.
Text Data: Explanation methods showed even greater disagreement, reflecting challenges in coping with high-dimensional data such as text, where word features are numerous.
Image Data: Using super-pixels, the methods LIME and KernelSHAP demonstrated relatively better agreement, while gradient-based methods showed significant disagreement at the pixel level.

The results underscore a pervasive inconsistency across methods irrespective of the data type, with gradient-based approaches particularly prone to disagreement.

Resolution and Practitioner Perspectives

Through further user studies involving 25 participants from both industry and academia, it became evident that practitioners regularly encounter the disagreement problem. However, current approaches to resolving these disagreements are largely heuristic and arbitrary, with many participants expressing uncertainty or reliance on personal preference or intuition. Notably, LIME and SHAP are often favored for tabular data based on their perceived stability and theoretical underpinnings.

Implications and Future Directions

The paper highlights the nascent and voluble state of the field of explainable AI concerning method agreement. The lack of standardization in explanation outputs poses a challenge for practitioners who rely on these insights for critical decision-making processes. The work suggests several future pathways, including the introduction of principled evaluation metrics to better assist practitioners in comparing explanation outputs, the development of new cohesive explanation methods built on unified guiding principles, and the need for ongoing practical education for data scientists regarding the capabilities and limitations of existing explainability techniques.

This investigation is notable for bridging the gap between theoretical AI research and practical machine learning applications, pointing out the importance of continued innovation in the development and standardization of explainable AI methodologies.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Satyapriya Krishna (27 papers)
Tessa Han (7 papers)
Alex Gu (20 papers)
Shahin Jabbari (18 papers)
Steven Wu (8 papers)
Himabindu Lakkaraju (88 papers)

Citations (162)

View on Semantic Scholar

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective (2202.01602v4)