Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The harms of class imbalance corrections for machine learning based prediction models: a simulation study (2404.19494v1)

Published 30 Apr 2024 in stat.ME

Abstract: Risk prediction models are increasingly used in healthcare to aid in clinical decision making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with respect to the modeled outcome (i.e., individuals with vs. without the event of interest are not equally represented in the data). It is common for researchers to correct this class imbalance, yet, the effect of such imbalance corrections on the calibration of machine learning models is largely unknown. We studied the effect of imbalance corrections on model calibration for a variety of machine learning algorithms. Using extensive Monte Carlo simulations we compared the out-of-sample predictive performance of models developed with an imbalance correction to those developed without a correction for class imbalance across different data-generating scenarios (varying sample size, the number of predictors and event fraction). Our findings were illustrated in a case study using MIMIC-III data. In all simulation scenarios, prediction models developed without a correction for class imbalance consistently had equal or better calibration performance than prediction models developed with a correction for class imbalance. The miscalibration introduced by correcting for class imbalance was characterized by an over-estimation of risk and was not always able to be corrected with re-calibration. Correcting for class imbalance is not always necessary and may even be harmful for clinical prediction models which aim to produce reliable risk estimates on an individual basis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. E.W. Steyerberg. Applications of prediction models, pages 11–31. Springer New York, New York, NY, 2009.
  2. Lingxiao Chen. Overview of clinical prediction models. Annals of Translational Medicine, 8(4), 2019.
  3. Calibration: the achilles heel of predictive analytics. BMC Medicine, 17(1):230, 2019.
  4. Clinical prediction models: diagnosis versus prognosis. Journal of Clinical Epidemiology, 132:142–145, April 2021.
  5. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ, 338:b606, June 2009. Publisher: British Medical Journal Publishing Group Section: Research Methods & Reporting.
  6. The class imbalance problem. Nature Methods, 18(11):1270–1272, 2021.
  7. An insight into rare class problem: Analysis and potential solutions. Journal of Computer Science, 14(6):777–792, May 2018.
  8. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. open problems on intrinsic data characteristics. Expert Systems with Applications, 39(7):6585–6608, 2012.
  9. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73:220–239, 2017.
  10. On the 12th day of christmas, a statistician sent to me . . . BMJ, 379, 2022.
  11. Calibration of risk prediction models: Impact on decision-analytic performance. Medical Decision Making, 35(2):162–169, 2015.
  12. Advantages of the nested case-control design in diagnostic research. BMC medical research methodology, 8:48, July 2008.
  13. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. Journal of the American Medical Informatics Association, 29(9):1525–1534, 06 2022.
  14. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. Journal of Clinical Epidemiology, 11 2022.
  15. Joie Ensor and Emma C. Martin and Richard D. Riley. pmsampsize: Calculates the Minimum Sample Size Required for Developing a Multivariable Prediction Model, 2022. R package version 1.1.2.
  16. Equivalence of improvement in area under roc curve and linear discriminant analysis coefficient under assumption of normality. Statistics in Medicine, 30(12):1410–1418, 2011.
  17. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl., 6(1):20–29, jun 2004.
  18. A comprehensive investigation of the performances of different machine learning classifiers with smote-enn oversampling technique and hyperparameter optimization for imbalanced heart failure dataset. Scientific Programming, 2022:3649406, 2022.
  19. Case–Control and Two-Gate Designs in Diagnostic Accuracy Studies. Clinical Chemistry, 51:1335–1341, August 2005. _eprint: https://academic.oup.com/clinchem/article-pdf/51/8/1335/32682656/clinchem1335.pdf.
  20. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, jun 2002.
  21. Iric: An r library for binary imbalanced classification. SoftwareX, 10:100341, 2019.
  22. Dennis L. Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):408–421, 1972.
  23. Empirical assessment of ensemble based approaches to classify imbalanced data in binary classification. International Journal of Advanced Computer Science and Applications, 2019.
  24. Rusboost: Improving classification performance when training data is skewed. pages 1–4, 2008.
  25. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2):539–550, 2009.
  26. Max Kuhn. Building predictive models in r using the caret package. Journal of Statistical Software, 28(5):1–26, 2008.
  27. Hsiang Hao and Chen. ebmc: Ensemble-based methods for class imbalance problem. 2022. R package version 1.0.1.
  28. Björn Böken. On the appropriateness of platt scaling in classifier calibration. Information Systems, 95:101641, 2021.
  29. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2021.
  30. Hadley Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
  31. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology (Cambridge, Mass.), 21(1):128–138, 01 2010.
  32. proc: an open-source package for r and s+ to analyze and compare roc curves. BMC Bioinformatics, 12:77, 2011.
  33. MIMIC-III, a freely accessible critical care database. Scientific Data, 3:160035, May 2016.
  34. MIMIC-III clinical database (version 1.4). PhysioNet, 2016.
  35. Hyperparameters and tuning strategies for random forest. WIREs Data Mining and Knowledge Discovery, 9(3), jan 2019.
  36. ROSE: a Package for Binary Imbalanced Learning. R Journal, 6(1):82–92, 2014.
Citations (2)

Summary

  • The paper demonstrates through simulation and empirical studies that class imbalance corrections frequently degrade the calibration of machine learning prediction models, causing risk overestimation that is not easily corrected.
  • The findings suggest that class imbalance corrections should not be automatically implemented, especially in clinical settings where accurate calibration is crucial, requiring case-by-case evaluation.
  • Researchers and practitioners should prioritize model calibration over achieving balanced data distributions and carefully evaluate the necessity of imbalance corrections.

Analysis of Class Imbalance Correction in Machine Learning for Risk Prediction Models

In the field of clinical prediction modeling, the reliability and accuracy of predictions are crucial, particularly when informing clinical decision-making processes. The paper "The harms of class imbalance corrections for machine learning-based prediction models: a simulation study" investigates the effects of class imbalance correction techniques on the calibration performance of risk prediction models. Although class imbalance is a prevalent issue in clinical datasets, its correction is shown not to always yield desirable outcomes.

Risk prediction models often assist clinicians in estimating a patient's risk of experiencing specific events, such as the development of a disease. These models frequently face challenges due to the imbalance of classes in the data; the majority of patients may not experience the event of interest, leading to a minority class of event occurrences. Despite this imbalance, the authors argue the need for caution in blindly applying imbalance correction methods.

Summary of Findings

The authors analyze the impact of various class imbalance corrections on model calibration—a critical component for ensuring predicted risks accurately reflect observed outcomes. Through extensive Monte Carlo simulations across different data-generating scenarios, the study compares the calibration and performance of prediction models developed with and without imbalance corrections using machine learning algorithms like Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and others.

Key findings from the simulations reveal that models developed without imbalance corrections frequently offer better calibration performance compared to those where imbalance corrections were applied. Specifically, models without imbalance corrections produced risk estimations that aligned more closely with reality. Crucially, this misalignment, often an overestimation of risks, could not always be countered with post-hoc recalibration, thus posing a potentially harmful consequence for clinical predictions.

The study employed a case study using the MIMIC-III dataset to illustrate their findings empirically. This case study further supported the simulation findings, revealing that imbalance corrections did not necessarily improve, and often worsened, the predictive performance by degrading calibration.

Implications and Future Directions

These findings suggest that class imbalance corrections should not automatically be implemented in clinical prediction modeling. In some cases, particularly those involving a large degree of class imbalance, the native algorithm performance without artificial corrections can be more reliable. In terms of recalibrating models for improved accuracy, RTOS-RF and ROS-XGBoost were the exceptions that maintained robust calibration post-recalibration, which requires further investigation.

Moreover, this research contributes to the understanding that blindly following machine learning preprocessing pipelines can lead to detrimental impacts, particularly in contexts where model calibration is vital, such as healthcare. Therefore, it's suggested that researchers involved in developing clinical prediction models should carefully evaluate the necessity and impact of applying class imbalance corrections on a case-by-case basis.

Future research could explore the implications of these findings in higher-dimensional settings or across different domains of machine learning. An in-depth analysis of why certain models (like RF in combination with ROS) have resilient performance might yield new insights into sophisticated, nuanced application adjustments for specific algorithms.

In conclusion, the study advises that the calibration—the alignment of predicted risk with true risk—should take precedence over attempting to achieve balanced data distributions through corrections. The paper serves as a cautionary guide for researchers and practitioners in the field of machine learning applications within clinical settings, highlighting the need for careful consideration when addressing issues of class imbalance.