Reliable Post hoc Explanations: Modeling Uncertainty in Explainability (2008.05030v4)

Published 11 Aug 2020 in cs.LG and stat.ML

Abstract: As black box explanations are increasingly being employed to establish model credibility in high-stakes settings, it is important to ensure that these explanations are accurate and reliable. However, prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability. In addition, these methods are also computationally inefficient, and require significant hyper-parameter tuning. In this paper, we address the aforementioned challenges by developing a novel Bayesian framework for generating local explanations along with their associated uncertainty. We instantiate this framework to obtain Bayesian versions of LIME and KernelSHAP which output credible intervals for the feature importances, capturing the associated uncertainty. The resulting explanations not only enable us to make concrete inferences about their quality (e.g., there is a 95% chance that the feature importance lies within the given range), but are also highly consistent and stable. We carry out a detailed theoretical analysis that leverages the aforementioned uncertainty to estimate how many perturbations to sample, and how to sample for faster convergence. This work makes the first attempt at addressing several critical issues with popular explanation methods in one shot, thereby generating consistent, stable, and reliable explanations with guarantees in a computationally efficient manner. Experimental evaluation with multiple real world datasets and user studies demonstrate that the efficacy of the proposed framework.

Citations (148)

View on Semantic Scholar

Summary

The paper proposes a Bayesian framework that quantifies uncertainty in local explanation methods to enhance the reliability of post hoc explanations.
It reformulates traditional techniques like LIME and SHAP by using Gaussian priors to generate credible intervals for feature importance estimates.
The study introduces focused sampling to reduce query requirements and empirically validates improved stability and accuracy on multiple datasets.

Reliable Post Hoc Explanations: Modeling Uncertainty in Explainability

In the domain of machine learning interpretability, the increasing deployment of black box models in high-stakes decision-making contexts necessitates robust and reliable explanation methods. The paper "Reliable Post hoc Explanations: Modeling Uncertainty in Explainability" by Slack et al., addresses the prevalent issues of instability, inconsistency, and inefficiency in local explanation techniques such as LIME and SHAP. The authors propose a novel Bayesian framework that prioritizes modeling uncertainty in order to generate reliable explanations for black box models.

Bayesian Framework for Local Explanations

The proposed framework centers on a Bayesian reformulation of local explanation techniques, allowing for the estimation of credible intervals that quantify the uncertainty around feature importances in explanations. This marks a significant step beyond traditional point estimates. By employing a generative model that captures the uncertainty inherent in approximating complex decision boundaries with simpler local linear models, the authors offer versions of LIME and KernelSHAP, termed BayesLIME and BayesSHAP, respectively.

The framework employs Gaussian priors over the feature importance parameters and considers the variance of these priors as a function of the perturbations to reflect proximity-based error assumptions intuitively. Posterior distributions are derived in closed form, enabling the practical application of these methods without additional computational overhead.

Quantifying and Utilizing Uncertainty

A salient feature of the Bayesian approach is its capacity to output credible intervals for the feature importances, which can be used to infer explanation quality—a step that offers a direct countermeasure to the instability issues of existing methods. This allows researchers to measure reliability and specify desired levels of uncertainty in explanations, offering a concrete metric for explanation quality that goes beyond traditional fidelity metrics, which can be misleading.

The authors also derive an expression to estimate the necessary number of perturbations required to achieve a specified certainty level in feature importance estimates. These insights stem from leveraging theoretical results that provide a direct relationship between perturbation sample size and explanation certainty.

Enhancing Explanation Efficiency

To address the inefficiency problem of local explanation methods, the authors introduce a sampling technique known as focused sampling. This technique iteratively selects the most informative perturbations based on predictive uncertainty, significantly reducing the number of queries required to reach a reliable explanation. This active learning approach to sampling can potentially halve the query requirements in some settings.

Empirical Validation

The authors validate their framework through experiments on various datasets, including COMPAS, German Credit, MNIST, and ImageNet. The results consistently demonstrate the improved stability and accuracy of the explanations generated by BayesLIME and BayesSHAP. The empirical evidence corroborates the ability of the Bayesian framework to produce reliable, high-confidence explanations that align closely with the underlying behavior of the black box model.

Implications and Future Directions

The introduction of a Bayesian approach to local explanation methods fundamentally enriches the toolset available to machine learning practitioners and researchers, offering a means to rigorously quantify the uncertainty in model explanations. The framework not only provides stability and reliability in dynamic, high-stakes environments but also opens avenues for further research into global explanations with uncertainty guarantees.

While the paper successfully addresses intrinsic challenges in post hoc explainability, the reliance on local linear model approximations for modeling non-linear decision surfaces remains a limitation. Future research could explore extending these probabilistic methods to capture complex dependencies and potentially integrate the framework with adversarial robustness to guard against deceptive inputs.

In summary, this work significantly advances the interpretability of black box models, providing a robust methodology that aligns with the stringent requirements of practical applications in regulated domains such as healthcare and finance.

PDF Markdown

Related Papers

YouTube

Show All Videos