Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model (1511.01644v1)

Published 5 Nov 2015 in stat.AP, cs.LG, and stat.ML

Abstract: We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS$_2$ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more accurate.

Citations (722)

View on Semantic Scholar

Summary

The paper introduces Bayesian Rule Lists (BRL), a novel model combining rule-based decision lists with Bayesian analysis for interpretable stroke prediction.
The methodology employs a sparse hierarchical prior and posterior distribution over decision lists, balancing accuracy with interpretability.
Application to stroke prediction showcases BRL's enhanced AUC performance compared to traditional scoring systems, maintaining clinical transparency.

Interpretable Classifiers Using Rules and Bayesian Analysis: Building a Better Stroke Prediction Model

The paper by Letham et al. introduces a novel approach for developing highly interpretable classifiers with competitive predictive accuracy, termed Bayesian Rule Lists (BRL). This method addresses a key limitation in machine learning by balancing interpretability and accuracy, making it particularly useful for applications in fields such as personalized medicine, where understanding the model's decision-making process is as important as the predictions themselves.

Model and Methodology

The BRL approach produces decision lists which consist of a series of if...then... statements. Each condition partitions a multivariate feature space into interpretable decision statements, contributing to model transparency. The novelty lies in employing a Bayesian framework, which generates a posterior distribution over possible decision lists. This framework uses a specific prior structure to encourage sparsity, promoting concise and interpretable models.

Generative Model

The BRL algorithm operates in a multi-class classification setting and employs a generative model for constructing decision lists. The model involves:

Sampling decision list lengths and the default rule parameter from respective distributions.
Iteratively sampling antecedent cardinalities and specific antecedents based on their cardinalities.
Assigning multinomial distributions to outcome labels and updating these distributions as new observations are classified by the antecedents in the list.

Prior and Likelihood Structures

BRL introduces a hierarchical prior structure that favors smaller, easily interpretable lists. Decision list lengths are sampled from a truncated Poisson distribution, parametrized based on the number of pre-selected antecedents. Moreover, the antecedent cardinalities are also sampled from a truncated Poisson distribution, ensuring each antecedent has the requisite complexity.

The likelihood is derived from multinomial distributions over observed label counts at each rule, producing a Dirichlet-multinomial distribution which is computationally feasible to handle.

Simulation Studies and Results

The authors employed extensive simulation studies to evaluate BRL's performance. Results indicated that, with increasing observation numbers, the posterior distribution of decision lists converged to the true model. This was visualized using Levenshtein distance metrics, which illustrated decreasing model distances with larger datasets.

Moreover, the model was tested on the deterministic Tic-Tac-Toe Endgame dataset from the UCI Machine Learning Repository, successfully discovering the complete set of conditions for a win scenario, achieving perfect accuracy compared to other algorithms like C5.0, CART, and random forests.

Application to Stroke Prediction

A significant application of BRL discussed in the paper is stroke prediction for patients with atrial fibrillation using the MarketScan Medicaid Multi-State Database (MDCD). The dataset included 12,586 patients with 4148 features. The BRL model not only retained interpretability akin to the CHADS $_2$ medical scoring system but also improved prediction accuracy as demonstrated by AUC metrics.

The BRL point estimate model, compared across five folds of cross-validation, consistently outperformed CHADS $_2$ and CHA $_2$ DS $_2$ -VASc scores. Specifically, the development of an alternative stroke prediction model showed enhanced predictive ability while maintaining the simplicity and interpretability crucial for clinical application.

Practical Implications and Future Directions

The BRL's ability to bridge the gap between interpretability and accuracy has profound implications for personalized medicine and other domains where model transparency is critical. By employing pre-mined rules and Bayesian analysis, BRL provides a methodology for constructing interpretable models that can be easily validated and trusted by domain experts. It further opens pathways for integrating larger feature spaces and observational data efficiently.

Future research could focus on refining the generative model to handle larger and even more complex datasets, and adapting BRL for other critical applications such as financial risk modeling or fraud detection. Additionally, enhancing MCMC algorithms for faster convergence and exploring alternative prior structures could further optimize the balance between accuracy, interpretability, and computational efficiency.

Overall, Bayesian Rule Lists signify an important step towards creating robust, interpretable machine learning models that meet the practical needs of various scientific and societal problems, marking them as a valuable tool in the advancement of algorithmic decision-making.

PDF Markdown