Self-explaining AI as an alternative to interpretable AI (2002.05149v6)

Published 12 Feb 2020 in cs.AI, cs.CY, cs.LG, and stat.ML

Abstract: The ability to explain decisions made by AI systems is highly sought after, especially in domains where human lives are at stake such as medicine or autonomous vehicles. While it is often possible to approximate the input-output relations of deep neural networks with a few human-understandable rules, the discovery of the double descent phenomena suggests that such approximations do not accurately capture the mechanism by which deep neural networks work. Double descent indicates that deep neural networks typically operate by smoothly interpolating between data points rather than by extracting a few high level rules. As a result, neural networks trained on complex real world data are inherently hard to interpret and prone to failure if asked to extrapolate. To show how we might be able to trust AI despite these problems we introduce the concept of self-explaining AI. Self-explaining AIs are capable of providing a human-understandable explanation of each decision along with confidence levels for both the decision and explanation. For this approach to work, it is important that the explanation actually be related to the decision, ideally capturing the mechanism used to arrive at the explanation. Finally, we argue it is important that deep learning based systems include a "warning light" based on techniques from applicability domain analysis to warn the user if a model is asked to extrapolate outside its training distribution. For a video presentation of this talk see https://www.youtube.com/watch?v=Py7PVdcu7WY& .

Citations (55)

View on Semantic Scholar

Summary

The paper proposes a self-explaining AI framework that pairs predictions with quantitative explanations using mutual information.
The study critiques traditional interpretability methods, revealing their limits in capturing deep neural networks' local interpolation behavior.
It emphasizes incorporating uncertainty and applicability analysis, suggesting Bayesian techniques to identify and signal prediction reliability.

Self-Explaining AI as an Alternative to Interpretable AI

The paper "Self-explaining AI as an alternative to interpretable AI" by Daniel C. Elton addresses the inherent complexities and challenges in interpreting deep neural networks (DNNs), especially in critical domains like medicine and autonomous vehicle technology. Current efforts in interpretable AI focus on deriving human-understandable rules from neural networks, but recent insights such as the double descent phenomenon suggest these approximations might not reflect the true operational mechanisms of these systems. Double descent reveals that DNNs function primarily through local interpolation rather than global rule extraction, resulting in significant limitations in their interpretability and ability to extrapolate outside their training distribution.

Core Concepts

Challenges of Interpretation: The prevailing efforts to decode black-box models include a variety of techniques, such as saliency maps and Shapley values, yet these methods capture only selective aspects of complex models and may not robustly reflect the model’s performance across different scenarios. These limitations necessitate a shift toward more robust methods that align closely with the mechanisms of prediction used by neural networks.
Self-Explaining AI: Inspired by human trust dynamics, the paper advocates for self-explaining AI systems that simultaneously produce a prediction and an accompanying explanation. Instead of relying on separate interpretable models, self-explaining AIs would provide mechanisms to verify their prediction constructively. This involves quantitatively establishing a relationship between the explanation and the prediction, utilizing mutual information to measure the connection between different neural activations and the output of interest.
Applicability Domain and Uncertainty: The paper posits that trustworthy AI systems should incorporate mechanisms to inform users when a model is operating outside its domain of applicability, such as through applicability domain analysis. Additionally, integrating uncertainty quantification within AI models can bolster reliability. Bayesian neural networks are suggested, either through traditional methods or approximations like dropout during inference, to provide a measure of the confidence in predictions.

Practical and Theoretical Implications

The exploration of self-explaining AI presents notable implications in both practical and theoretical realms. Practically, an AI system that can reliably explain its decisions aids in fostering trust and accountability, especially in high-stakes environments like healthcare diagnostics and automated decision-making systems. Theoretically, self-explaining AI challenges existing paradigms of model interpretability, urging the field to reimagine notions of understanding beyond simple model compression or rule extraction, acknowledging the complex interpolation behavior underlying DNNs.

The concept of self-explaining AI elevates the discourse on model understanding, calling for deeper investigation into explanatory mechanisms that interface directly with model predictions and human interpretation. It emphasizes the significance of mechanistic and meta-level explanations that genuinely reflect the intricate operations of DNNs, rather than reduced approximations. Further research could explore advanced self-explaining models that leverage modern AI infrastructure while ensuring enhancements in AI trustworthiness and safety.

Future Developments

Looking ahead, the work set forth by Elton suggests a productive avenue for AI research, wherein building trustworthy, self-explaining systems will involve rigorous techniques in mutual information analysis and applicability domain exploration. The development of methodologies to verify and benchmark the quality of AI explanations systematically, beyond heuristic approaches, will be vital. Future advancements in AI systems might fully integrate uncertainty metrics, leading to more robust decision-making processes adaptable across various domains of application. The principles gleaned from this work promise a more collaborative framework between AI innovation and human interpretability, critical for safe AI deployment in complex real-world environments.

PDF Markdown

Related Papers

YouTube

Show All Videos