Interpretable Machine Learning: Definitions, Methods, and Applications
Murdoch et al. tackle the challenging and increasingly relevant problem of interpretability in ML. They aim to clarify the concept of interpretability in ML and provide a robust framework for discussing and evaluating interpretability methods. The paper introduces the Predictive, Descriptive, Relevant (PDR) framework and categorizes interpretation methods into model-based and post-hoc approaches. Through rigorous analysis and numerous real-world examples, the authors articulate this framework and suggest directions for future research, with particular emphasis on the critical yet under-appreciated role of human audiences in determining the relevancy of interpretations.
Interpretability in Machine Learning: A Definition
The authors define interpretable machine learning as the use of ML models for extracting relevant knowledge about domain relationships contained in data. Here, relevancy is judged in the context of the problem and the intended audience. The definitions and methods of interpretability are broad and encompass various outputs, ranging from visualizations and natural language descriptions to mathematical equations.
PDR Framework
The PDR framework serves as the cornerstone of this paper, offering a structured approach to evaluating interpretable ML methods. It encompasses three key desiderata:
- Predictive Accuracy: This measures how well the model approximates the underlying relationships in the data. High predictive accuracy ensures that the model is capturing real patterns rather than noise.
- Descriptive Accuracy: This evaluates how well the interpretations describe what the model has learned. Descriptive accuracy is crucial for trustworthy interpretations, especially when the underlying model is complex.
- Relevancy: This considers whether the extracted information is meaningful for the intended audience and specific problem context. High relevancy ensures that the interpretations are actionable and understandable by stakeholders.
Categorization of Interpretation Methods
The methods are divided into two primary categories: model-based and post-hoc interpretability.
Model-Based Interpretability
Model-based methods enhance interpretability by constraining the model itself to be more understandable. This sometimes leads to a lower predictive accuracy but gives high descriptive accuracy. The paper identifies several sub-categories under model-based interpretability:
- Sparsity: Imposing sparsity constraints on models (e.g., LASSO) helps in identifying key features and understanding their influence. This is particularly useful in high-dimensional data contexts, such as genomics.
- Simulatability: Models are deemed simulatable if humans can easily understand and simulate their decision-making process. Decision trees and rule-based models are typical examples.
- Modularity: Modular models allow individual parts of the model to be interpreted independently. Techniques like generalized additive models exemplify this approach.
- Domain-Based Feature Engineering: Leveraging domain knowledge to create meaningful features improves both predictive and descriptive accuracy. Methods for feature extraction in specific domains like natural language processing and computer vision fall under this category.
Post-Hoc Interpretability
Post-hoc methods interpret complex, pre-trained models without altering them. These methods are particularly pertinent for understanding black-box models.
- Dataset-Level Interpretation: Focuses on global patterns learned by the model, such as feature importance scores or visualizations of learned features.
- Prediction-Level Interpretation: Examines how specific predictions are made, often using feature importance scores at the instance level or generating heatmaps for input features.
Real-World Applications and Examples
The paper is rich with real-world examples demonstrating the application of these interpretability methods. For example, generalized additive models are used to highlight biases in medical predictions, and sparse canonical correlation analysis helps in managing and interpreting massive genomic datasets.
Future Directions
The authors outline several key areas for future research:
- Descriptive Accuracy: Establishing robust methods for evaluating descriptive accuracy remains a significant challenge. The paper suggests simulation studies and leveraging existing experimental findings to assess interpretative methods more rigorously.
- Relevancy: Demonstrating improved relevancy involves direct applications in solving domain-specific problems or using human studies to validate that the interpretations are meaningful to practitioners.
- Model Development: More accurate and interpretable models need to be developed to bridge the gap between interpretability and predictive power.
- Feature Engineering Tools: Improved tools for exploratory data analysis and unsupervised learning can lead to the generation of more informative features, expanding the applicability of model-based interpretability.
Conclusion
Murdoch et al.'s work on interpretable machine learning offers a comprehensive framework and detailed categorization of methods. By defining the PDR framework and providing numerous real-world examples, they contribute significantly to the discourse on the interpretability of ML models. This work stands as a vital resource for researchers aiming to understand and develop interpretable machine learning methods that are not only predictive and descriptive but, most importantly, relevant to the intended human audience.