Towards A Rigorous Science of Interpretable Machine Learning
The increasing ubiquity of ML systems necessitates not only a strong focus on task performance but also attention to auxiliary criteria such as safety, nondiscrimination, technical debt, and the right to explanation. The paper by Doshi-Velez and Kim seeks to address the significant yet often elusive concept of interpretability in ML, proposing a path towards its rigorous definition and evaluation.
Conceptualizing Interpretability
To begin with, the authors define interpretability as the ability of an ML system to explain or present its reasoning in understandable terms to a human. This definition serves as a foundation for exploring interpretability's role in confirming other essential ML system desiderata, such as fairness, privacy, robustness, and trust.
The paper identifies that interpretability often serves to bridge the gap left by incomplete problem formalizations in various domains like scientific understanding, safety, and ethics. Unlike uncertainty which can be quantified, incompleteness presents challenges that necessitate interpretability for qualitative verification and validation.
Scenarios Advocating Interpretability
The paper delineates specific scenarios where interpretability is crucial, driven by the presence of incompleteness. These include:
- Scientific Understanding: When the goal is knowledge acquisition, interpretability facilitates explanations that translate into knowledge.
- Safety: For complex systems (e.g., semi-autonomous vehicles), comprehensive testing is impractical, and interpretability enables verifying safety through understandable reasoning.
- Ethics: Various discrimination concerns that cannot be fully encoded in systems necessitate interpretable models to identify and address biases.
- Mismatched Objectives: When optimizing a surrogate objective, interpretability helps ensure the surrogate aligns well with the ultimate goal.
- Multi-objective Trade-offs: In scenarios like privacy versus prediction quality trade-offs, interpretability helps understand and navigate competing objectives.
A Taxonomic Approach to Evaluate Interpretability
Doshi-Velez and Kim propose a taxonomy for evaluating interpretability, aligned with an ML evaluation framework:
- Application-grounded Evaluation: Real-world tasks and domain-expert evaluations validate interpretability. This method directly corresponds with the target application, ensuring practical relevance.
- Human-grounded Metrics: Simplified tasks conducted with lay humans to gauge the general qualities of explanations. While less resource-intensive, it offers insights into general interpretability.
- Functionally-grounded Evaluation: No human experiments involved; it uses proxies such as sparsity or model transparency to evaluate interpretability. This is most suitable for preliminary methods that require further human-grounded validation.
Open Questions and Future Directions
Addressing the critical need for systematic interpretability assessments, the paper identifies several open problems requiring further exploration:
- Identification of Proxies: Determining the most suitable proxies for interpretability across different applications.
- Designing Simplified Tasks: Creating human-grounded tasks that retain essential aspects of real-world applications.
- Characterizing Proxies: Defining proxies for explanation quality based on human evaluation elements.
The authors suggest a data-driven approach for discovering factors affecting interpretability such as global vs. local needs, severity and area of incompleteness, time constraints, and user expertise. They advocate constructing a matrix correlating real-world tasks and methods to discover latent dimensions of interpretability.
Methodological Considerations
A significant part of the discourse is focused on hypothesizing latent dimensions of interpretability both from task-related and method-related perspectives. Each potential factor is explored for its implications in real-world scenarios and simplified human experiments.
Recommendations for Researchers
The culmination of this work is a set of guiding principles for researchers in the domain of interpretable ML:
- Alignment of Claims and Evaluation: Ensuring specific application-related claims are matched with appropriate evaluative methods.
- Structured Taxonomies: Using common taxonomies to describe and compare interpretability-related research.
In conclusion, Doshi-Velez and Kim’s paper provides a foundational framework for moving towards a rigorous science of interpretable machine learning. By addressing the need for interpretability through structural definitions, evaluative approaches, and open questions, it significantly contributes to aligning ML advancements with broader societal and operational requirements. This work sets the stage for future research to refine and expand the evaluative criteria, enabling the deployment of ML systems that are not only efficient but also transparent and reliable.