When accurate prediction models yield harmful self-fulfilling prophecies (2312.01210v4)
Abstract: Prediction models are popular in medical research and practice. By predicting an outcome of interest for specific patients, these models may help inform difficult treatment decisions, and are often hailed as the poster children for personalized, data-driven healthcare. We show however, that using prediction models for decision making can lead to harmful decisions, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients but the worse outcome of these patients does not invalidate the predictive power of the model. Our main result is a formal characterization of a set of such prediction models. Next we show that models that are well calibrated before and after deployment are useless for decision making as they made no change in the data distribution. These results point to the need to revise standard practices for validation, deployment and evaluation of prediction models that are used in medical decisions.
- Discrimination and calibration of clinical prediction models: users’ guides to the medical literature. Jama, 318(14):1377–1384, 2017.
- Invariant Risk Minimization, March 2020. URL http://arxiv.org/abs/1907.02893. arXiv:1907.02893 [cs, stat].
- 2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation, 140(11), September 2019. ISSN 0009-7322, 1524-4539. 10.1161/CIR.0000000000000678. URL https://www.ahajournals.org/doi/10.1161/CIR.0000000000000678.
- K. Breur. Growth rate and radiosensitivity of human tumours—II: Radiosensitivity of human tumours. European Journal of Cancer (1965), 2(2):173–188, June 1966. ISSN 0014-2964. 10.1016/0014-2964(66)90009-0. URL https://www.sciencedirect.com/science/article/pii/0014296466900090.
- A tutorial on calibration measurements and calibration models for clinical prediction models. Journal of the American Medical Informatics Association, 27(4):621–633, 2020.
- Blood pressure-lowering treatment strategies based on cardiovascular risk versus blood pressure: A meta-analysis of individual participant data. PLoS medicine, 15(3):e1002538, March 2018. ISSN 1549-1676. 10.1371/journal.pmed.1002538.
- American Joint Committee on Cancer acceptance criteria for inclusion of risk models for individualized prognosis in the practice of precision medicine. CA: a cancer journal for clinicians, 66(5):370–374, September 2016. ISSN 1542-4863. 10.3322/caac.21339.
- Prognostic models will be victims of their own success, unless…. Journal of the American Medical Informatics Association, 26(12):1645–1650, December 2019. ISSN 1527-974X. 10.1093/jamia/ocz145. URL https://academic.oup.com/jamia/article/26/12/1645/5559573.
- Model updating after interventions paradoxically introduces bias. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, pages 3916–3924. PMLR, March 2021. URL https://proceedings.mlr.press/v130/liley21a.html. ISSN: 2640-3498.
- Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Annals of Internal Medicine, 162(1):W1, January 2015. ISSN 0003-4819. 10/gfrkkz. URL http://annals.org/article.aspx?doi=10.7326/M14-0698.
- John Muschelli. ROC and AUC with a Binary Predictor: a Potentially Misleading Metric. Journal of classification, 37(3):696, October 2020. 10.1007/s00357-019-09345-1. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7695228/. Publisher: NIH Public Access.
- Performative Prediction, February 2021. URL http://arxiv.org/abs/2002.06673. arXiv:2002.06673 [cs, stat].
- Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations, February 2023. URL http://arxiv.org/abs/2107.00520. arXiv:2107.00520 [cs, stat].
- Gene Expression Signature to Improve Prognosis Prediction of Stage II and III Colorectal Cancer. Journal of Clinical Oncology, 29(1):17–24, January 2011. ISSN 0732-183X. 10/d2zq5b. URL https://ascopubs.org/doi/10.1200/JCO.2010.30.1077. 384 citations (Crossref) [2021-08-06] Publisher: Wolters Kluwer.
- Explicit causal reasoning is needed to prevent prognostic models being victims of their own success. Journal of the American Medical Informatics Association, 26(12):1675–1676, December 2019. ISSN 1527-974X. 10.1093/jamia/ocz197. URL https://academic.oup.com/jamia/article/26/12/1675/5625126.
- Ewout W Steyerberg. Applications of prediction models. Springer, 2009.
- Decision making in cancer: Causal questions require causal answers, September 2022. URL http://arxiv.org/abs/2209.07397. arXiv:2209.07397 [cs, stat].
- Calibration: the achilles heel of predictive analytics. BMC medicine, 17(1):1–7, 2019.
- Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making, 26(6):565–574, 2006.
- On Calibration and Out-of-Domain Generalization. In Advances in Neural Information Processing Systems, volume 34, pages 2215–2227. Curran Associates, Inc., 2021. URL https://papers.nips.cc/paper_files/paper/2021/hash/118bd558033a1016fcc82560c65cca5f-Abstract.html.
- Wouter A. C. van Amsterdam (7 papers)
- Nan van Geloven (7 papers)
- Jesse H. Krijthe (22 papers)
- Rajesh Ranganath (76 papers)
- Giovanni Ciná (14 papers)