Predictive Multiplicity in Probabilistic Classification (2206.01131v3)
Abstract: Machine learning models are often used to inform real world risk assessment tasks: predicting consumer default risk, predicting whether a person suffers from a serious illness, or predicting a person's risk to appear in court. Given multiple models that perform almost equally well for a prediction task, to what extent do predictions vary across these models? If predictions are relatively consistent for similar models, then the standard approach of choosing the model that optimizes a penalized loss suffices. But what if predictions vary significantly for similar models? In machine learning, this is referred to as predictive multiplicity i.e. the prevalence of conflicting predictions assigned by near-optimal competing models. In this paper, we present a framework for measuring predictive multiplicity in probabilistic classification (predicting the probability of a positive outcome). We introduce measures that capture the variation in risk estimates over the set of competing models, and develop optimization-based methods to compute these measures efficiently and reliably for convex empirical risk minimization problems. We demonstrate the incidence and prevalence of predictive multiplicity in real-world tasks. Further, we provide insight into how predictive multiplicity arises by analyzing the relationship between predictive multiplicity and data set characteristics (outliers, separability, and majority-minority structure). Our results emphasize the need to report predictive multiplicity more widely.
- Accounting for Model Uncertainty in Algorithmic Discrimination, volume 1. Association for Computing Machinery. ISBN 9781450384735.
- Credit risk assessment using machine learning algorithms. Advanced Science Letters, 23(4): 3649–3653.
- Kentucky Pretrial Risk Assessment Instrument Validation. The JFA Institute, 5.
- Credit risk assessment model for Jordanian commercial banks: Neural scoring approach. Review of Development Finance, 4(1): 20–28.
- Logistic regression: From art to science. Statistical Science, 367–384.
- Best subset selection via a modern optimization lens. Annals of statistics, 44(2): 813–852.
- Leave-One-out Unfairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, 285–295. New York, NY, USA: Association for Computing Machinery. ISBN 9781450383097.
- Selective Ensembles for Consistent Predictions. In International Conference on Learning Representations.
- Model Multiplicity: Opportunities, Concerns, and Solutions. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, 850–863. New York, NY, USA: Association for Computing Machinery. ISBN 9781450393522.
- Breiman, L. 2001. Statistical modeling: The two cultures. Statistical Science, 16(3): 199–215.
- Chatfield, C. 1995. Model Uncertainty, Data Mining and Statistical Inference. Journal of the Royal Statistical Society. Series A (Statistics in Society), 158(3): 419.
- Probabilistic machine learning for healthcare. Annual Review of Biomedical Data Science, 4: 393–415.
- Courts and predictive algorithms. Data & civil rights: A new era of policing and justice, 13.
- Characterizing Fairness Over the Set of Good Models Under Selective Labels. CoRR, abs/2101.00352.
- Underspecification Presents Challenges for Credibility in Modern Machine Learning. Journal of Machine Learning Research, 23(226): 1–61.
- CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17: 1–5.
- Exploring the cloud of variable importance for the set of all good models. Nature Machine Intelligence, 2(12): 810–824.
- Analyzing the role of model uncertainty for electronic health records. ACM CHIL 2020 - Proceedings of the 2020 ACM Conference on Health, Inference, and Learning, 204–213.
- All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(Vi).
- Optimized cutting plane algorithm for support vector machines. In Proceedings of the 25th International Conference on Machine Learning, 320–327. ACM.
- Machine learning with abstention for automated liver disease diagnosis. In 2017 International Conference on Frontiers of Information Technology (FIT), 356–361. IEEE.
- How Visualizing Inferential Uncertainty Can Mislead Readers about Treatment Effects in Scientific Results. Conference on Human Factors in Computing Systems - Proceedings.
- Cutting-plane training of structural SVMs. Machine Learning, 77(1): 27–59.
- Decisions With Uncertainty: The Glass Half Full. Current Directions in Psychological Science, 22(4): 308–315.
- Visual Reasoning Strategies for Effect Size Judgments and Decisions. IEEE Transactions on Visualization and Computer Graphics, 1–1.
- Heart Score Value. Netherlands Heart Journal, 10(6): 1–10.
- Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digital Medicine, 4(1): 1–6.
- Second opinion needed: communicating uncertainty in medical machine learning. npj Digital Medicine, 4(1).
- The creation and validation of the ohio risk assessment system (ORAS). Federal Probation, 74(1): 16–22.
- Closer than they Appear: A Bayesian Perspective on Individual-Level Heterogeneity in Risk Assessment. Journal of the Royal Statistical Society Series A: Statistics in Society, 185(2): 588–614.
- Predictive Multiplicity in Classification. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org.
- When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making. CoRR, abs/2011.06167.
- SAPS 3 - From evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission. Intensive Care Medicine, 31(10): 1345–1355.
- Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, 7076–7087. PMLR.
- Binary classifier calibration using a Bayesian non-parametric approach. SIAM International Conference on Data Mining 2015, SDM 2015, 208–216.
- On counterfactual explanations under predictive multiplicity. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, UAI 2020, 124: 839–848.
- With Malice Toward None: Assessing Uncertainty via Equalized Coverage. Harvard Data Science Review, 1–14.
- A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning. CoRR, abs/1908.01755.
- A tutorial on conformal prediction. Journal of Machine Learning Research, 9: 371–421.
- The illusion of predictability: How regression statistics mislead experts. International Journal of Forecasting, 28(3): 695–711.
- A deep learning approach to antibiotic discovery. Cell, 180(4): 688–702.
- Development and validation of the emergency department assessment of chest pain score and 2h accelerated diagnostic protocol. EMA - Emergency Medicine Australasia, 26(1): 34–44.
- Optimized Risk Scores. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.
- Counterfactual invariance to spurious correlations: Why and how to pass stress tests. arXiv preprint arXiv:2106.00545.
- On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.
- Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(3): 689–722.
- Jamelle Watson-Daniels (6 papers)
- David C. Parkes (81 papers)
- Berk Ustun (26 papers)