The harms of class imbalance corrections for machine learning based prediction models: a simulation study

Published 30 Apr 2024 in stat.ME | (2404.19494v1)

Abstract: Risk prediction models are increasingly used in healthcare to aid in clinical decision making. In most clinical contexts, model calibration (i.e., assessing the reliability of risk estimates) is critical. Data available for model development are often not perfectly balanced with respect to the modeled outcome (i.e., individuals with vs. without the event of interest are not equally represented in the data). It is common for researchers to correct this class imbalance, yet, the effect of such imbalance corrections on the calibration of machine learning models is largely unknown. We studied the effect of imbalance corrections on model calibration for a variety of machine learning algorithms. Using extensive Monte Carlo simulations we compared the out-of-sample predictive performance of models developed with an imbalance correction to those developed without a correction for class imbalance across different data-generating scenarios (varying sample size, the number of predictors and event fraction). Our findings were illustrated in a case study using MIMIC-III data. In all simulation scenarios, prediction models developed without a correction for class imbalance consistently had equal or better calibration performance than prediction models developed with a correction for class imbalance. The miscalibration introduced by correcting for class imbalance was characterized by an over-estimation of risk and was not always able to be corrected with re-calibration. Correcting for class imbalance is not always necessary and may even be harmful for clinical prediction models which aim to produce reliable risk estimates on an individual basis.

Abstract PDF HTML Upgrade to Chat

Authors (6)

References (36)

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates through simulation and empirical studies that class imbalance corrections frequently degrade the calibration of machine learning prediction models, causing risk overestimation that is not easily corrected.
The findings suggest that class imbalance corrections should not be automatically implemented, especially in clinical settings where accurate calibration is crucial, requiring case-by-case evaluation.
Researchers and practitioners should prioritize model calibration over achieving balanced data distributions and carefully evaluate the necessity of imbalance corrections.

Analysis of Class Imbalance Correction in Machine Learning for Risk Prediction Models

In the field of clinical prediction modeling, the reliability and accuracy of predictions are crucial, particularly when informing clinical decision-making processes. The paper "The harms of class imbalance corrections for machine learning-based prediction models: a simulation study" investigates the effects of class imbalance correction techniques on the calibration performance of risk prediction models. Although class imbalance is a prevalent issue in clinical datasets, its correction is shown not to always yield desirable outcomes.

Risk prediction models often assist clinicians in estimating a patient's risk of experiencing specific events, such as the development of a disease. These models frequently face challenges due to the imbalance of classes in the data; the majority of patients may not experience the event of interest, leading to a minority class of event occurrences. Despite this imbalance, the authors argue the need for caution in blindly applying imbalance correction methods.

Summary of Findings

The authors analyze the impact of various class imbalance corrections on model calibration—a critical component for ensuring predicted risks accurately reflect observed outcomes. Through extensive Monte Carlo simulations across different data-generating scenarios, the study compares the calibration and performance of prediction models developed with and without imbalance corrections using machine learning algorithms like Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and others.

Key findings from the simulations reveal that models developed without imbalance corrections frequently offer better calibration performance compared to those where imbalance corrections were applied. Specifically, models without imbalance corrections produced risk estimations that aligned more closely with reality. Crucially, this misalignment, often an overestimation of risks, could not always be countered with post-hoc recalibration, thus posing a potentially harmful consequence for clinical predictions.

The study employed a case study using the MIMIC-III dataset to illustrate their findings empirically. This case study further supported the simulation findings, revealing that imbalance corrections did not necessarily improve, and often worsened, the predictive performance by degrading calibration.

Implications and Future Directions

These findings suggest that class imbalance corrections should not automatically be implemented in clinical prediction modeling. In some cases, particularly those involving a large degree of class imbalance, the native algorithm performance without artificial corrections can be more reliable. In terms of recalibrating models for improved accuracy, RTOS-RF and ROS-XGBoost were the exceptions that maintained robust calibration post-recalibration, which requires further investigation.

Moreover, this research contributes to the understanding that blindly following machine learning preprocessing pipelines can lead to detrimental impacts, particularly in contexts where model calibration is vital, such as healthcare. Therefore, it's suggested that researchers involved in developing clinical prediction models should carefully evaluate the necessity and impact of applying class imbalance corrections on a case-by-case basis.

Future research could explore the implications of these findings in higher-dimensional settings or across different domains of machine learning. An in-depth analysis of why certain models (like RF in combination with ROS) have resilient performance might yield new insights into sophisticated, nuanced application adjustments for specific algorithms.

In conclusion, the study advises that the calibration—the alignment of predicted risk with true risk—should take precedence over attempting to achieve balanced data distributions through corrections. The paper serves as a cautionary guide for researchers and practitioners in the field of machine learning applications within clinical settings, highlighting the need for careful consideration when addressing issues of class imbalance.

Markdown Report Issue