On the (In)fidelity and Sensitivity for Explanations (1901.09392v4)

Published 27 Jan 2019 in cs.LG and stat.ML

Abstract: We consider objective evaluation measures of saliency explanations for complex black-box machine learning models. We propose simple robust variants of two notions that have been considered in recent literature: (in)fidelity, and sensitivity. We analyze optimal explanations with respect to both these measures, and while the optimal explanation for sensitivity is a vacuous constant explanation, the optimal explanation for infidelity is a novel combination of two popular explanation methods. By varying the perturbation distribution that defines infidelity, we obtain novel explanations by optimizing infidelity, which we show to out-perform existing explanations in both quantitative and qualitative measurements. Another salient question given these measures is how to modify any given explanation to have better values with respect to these measures. We propose a simple modification based on lowering sensitivity, and moreover show that when done appropriately, we could simultaneously improve both sensitivity as well as fidelity.

Citations (408)

View on Semantic Scholar

Summary

The paper introduces objective evaluation measures—infidelity and sensitivity—to assess saliency explanations in complex black-box models.
It analytically demonstrates that combining Smooth-Grad with Integrated Gradients minimizes infidelity and improves explanation quality.
The authors propose a kernel smoothing approach that reduces sensitivity, yielding more robust and faithful model interpretations.

Evaluation of Saliency Explanations in Black-Box Machine Learning Models: Infidelity and Sensitivity Measures

The paper "On the (In)fidelity and Sensitivity of Explanations" embarks on a rigorous analysis of saliency explanations for complex black-box machine learning models. It introduces novel objective evaluation measures—(in)fidelity and sensitivity—offering robust variants that strengthen the theoretical underpinnings of explanation methodologies. This paper critiques existing subjective measures prevalent in explanation evaluations and argues for objective metrics to enhance the rigour and applicability of saliency explanations.

Salient Contributions

Objective Measures and Optimal Explanations: The authors define two objective measures—infidelity and sensitivity—for evaluating explanations. Infidelity is framed as the expected difference between two terms: one being the dot product of input perturbations and explanations, and the other being output perturbations. Sensitivity, on the other hand, assesses how explanations are influenced by insignificant input perturbations. Notably, the paper finds that explanations characterized by low sensitivity often endure from high infidelity, marking a problematic trade-off in conventional models.
Infidelity as a Measure: Infidelity evaluates how effectively explanations capture perturbation-induced function value changes. The optimal explanation minimizing infidelity is shown to emerge from a combination of Smooth-Grad and Integrated Gradients. Notably, this is proven through analytical derivations and contrasts with explanations that use zero or baseline value perturbations.
Enhancement through Perturbation: By modifying input perturbation distributions, the authors propose novel explanation methods. These methods outperform traditional techniques across both qualitative and quantitative measures. Their experiments validate that the newly proposed explanations offer superior fidelity, challenging previously held assumptions in the interpretation of model outputs.
Smoothing to Reduce Sensitivity: A unique contribution is the proposal of a smoothed explanation algorithm that effectively reduces both sensitivity and infidelity. Using a kernel smoothing approach, the authors demonstrate how kernelized explanations, akin to Smooth-Grad, can amortize the sensitivity toward crisis-level perturbations while preserving explanation fidelity.

Theoretical and Practical Implications

The implications of these findings bear relevance to both theoretical research and practical applications. Theoretically, this work encourages a re-evaluation of fidelity in model explanations, prompting a renewed focus on outcome perturbation anticipation rather than a uniform saturation of input space explorations. By asserting that optimal explanations stem from fidelity to kernel-integrated insights, the paper breaks new ground in conceptual linkage between fidelity metrics and effective explanation structures.

Practically, the encouragement to adopt kernel smoothing techniques provides a pathway for algorithm designers seeking robust interpretability tools. With the rise of AI in sensitive applications such as healthcare and autonomous systems, the relevance of this paper's findings become exceptional. Highlighting the detrimental trade-offs tied to sensitivity, the authors tip the balance towards scientifically sound explanations that do away with unwarranted perturbation volatility.

Speculation for Future AI Developments

As machine learning models become increasingly complex and entrenched in high-stakes fields, the demand for non-intrusive, objective, and faithful explanations will grow correspondingly. The insights from this paper support a future where interpretability metrics can drive the design of new models, harnessing the twin forces of kernelized explanation structures and fidelity-oriented evaluation to yield inherently explainable AI systems.

The paper lays down a research agenda that can spur further exploration into the balance of sensitivity and infidelity. It challenges researchers to cultivate methods that simultaneously nurture explanation accuracy and robustness, underscoring the ethical considerations in deploying AI models whose decisions demand clear, rational elucidation.

In conclusion, "On the (In)fidelity and Sensitivity of Explanations" contributes a decisive stride toward objective evaluation in AI interpretability research. By entrenching both theoretical and empirical grounding, it strengthens the call for integrated fidelity analyses in saliency explanations and sets the stage for an age of transparent, understandable, and reliably interpretable AI.

PDF Markdown