Towards A Rigorous Science of Interpretable Machine Learning (1702.08608v2)

Published 28 Feb 2017 in stat.ML, cs.AI, and cs.LG

Abstract: As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.

PDF Abstract

Towards A Rigorous Science of Interpretable Machine Learning

The increasing ubiquity of ML systems necessitates not only a strong focus on task performance but also attention to auxiliary criteria such as safety, nondiscrimination, technical debt, and the right to explanation. The paper by Doshi-Velez and Kim seeks to address the significant yet often elusive concept of interpretability in ML, proposing a path towards its rigorous definition and evaluation.

Conceptualizing Interpretability

To begin with, the authors define interpretability as the ability of an ML system to explain or present its reasoning in understandable terms to a human. This definition serves as a foundation for exploring interpretability's role in confirming other essential ML system desiderata, such as fairness, privacy, robustness, and trust.

The paper identifies that interpretability often serves to bridge the gap left by incomplete problem formalizations in various domains like scientific understanding, safety, and ethics. Unlike uncertainty which can be quantified, incompleteness presents challenges that necessitate interpretability for qualitative verification and validation.

Scenarios Advocating Interpretability

The paper delineates specific scenarios where interpretability is crucial, driven by the presence of incompleteness. These include:

Scientific Understanding: When the goal is knowledge acquisition, interpretability facilitates explanations that translate into knowledge.
Safety: For complex systems (e.g., semi-autonomous vehicles), comprehensive testing is impractical, and interpretability enables verifying safety through understandable reasoning.
Ethics: Various discrimination concerns that cannot be fully encoded in systems necessitate interpretable models to identify and address biases.
Mismatched Objectives: When optimizing a surrogate objective, interpretability helps ensure the surrogate aligns well with the ultimate goal.
Multi-objective Trade-offs: In scenarios like privacy versus prediction quality trade-offs, interpretability helps understand and navigate competing objectives.

A Taxonomic Approach to Evaluate Interpretability

Doshi-Velez and Kim propose a taxonomy for evaluating interpretability, aligned with an ML evaluation framework:

Application-grounded Evaluation: Real-world tasks and domain-expert evaluations validate interpretability. This method directly corresponds with the target application, ensuring practical relevance.
Human-grounded Metrics: Simplified tasks conducted with lay humans to gauge the general qualities of explanations. While less resource-intensive, it offers insights into general interpretability.
Functionally-grounded Evaluation: No human experiments involved; it uses proxies such as sparsity or model transparency to evaluate interpretability. This is most suitable for preliminary methods that require further human-grounded validation.

Open Questions and Future Directions

Addressing the critical need for systematic interpretability assessments, the paper identifies several open problems requiring further exploration:

Identification of Proxies: Determining the most suitable proxies for interpretability across different applications.
Designing Simplified Tasks: Creating human-grounded tasks that retain essential aspects of real-world applications.
Characterizing Proxies: Defining proxies for explanation quality based on human evaluation elements.

The authors suggest a data-driven approach for discovering factors affecting interpretability such as global vs. local needs, severity and area of incompleteness, time constraints, and user expertise. They advocate constructing a matrix correlating real-world tasks and methods to discover latent dimensions of interpretability.

Methodological Considerations

A significant part of the discourse is focused on hypothesizing latent dimensions of interpretability both from task-related and method-related perspectives. Each potential factor is explored for its implications in real-world scenarios and simplified human experiments.

Recommendations for Researchers

The culmination of this work is a set of guiding principles for researchers in the domain of interpretable ML:

Alignment of Claims and Evaluation: Ensuring specific application-related claims are matched with appropriate evaluative methods.
Structured Taxonomies: Using common taxonomies to describe and compare interpretability-related research.

In conclusion, Doshi-Velez and Kim’s paper provides a foundational framework for moving towards a rigorous science of interpretable machine learning. By addressing the need for interpretability through structural definitions, evaluative approaches, and open questions, it significantly contributes to aligning ML advancements with broader societal and operational requirements. This work sets the stage for future research to refine and expand the evaluative criteria, enabling the deployment of ML systems that are not only efficient but also transparent and reliable.