Dice Question Streamline Icon: https://streamlinehq.com

Dataset dependence and sensitivity of classifiers to imbalance and augmentation (conjecture)

Establish whether the predictive performance of machine-learning classifiers is dependent on dataset characteristics and identify which classifiers are more sensitive to class imbalance and to augmentation techniques across diverse datasets.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper evaluates 11 machine-learning methods across eight imbalanced datasets and notes varied outcomes, suggesting that performance may depend on dataset characteristics and that algorithms may differ in their sensitivity to imbalance and augmentation.

By explicitly stating a conjecture of dataset-dependent performance and varying sensitivity, the authors highlight the need for systematic characterization of these dependencies across models and datasets.

References

We conjecture that the prediction performance of a machine-learning method is dataset dependent, and some methods might be more sensitive to data imbalance and data augmentation than other methods.

Experimenting with an Evaluation Framework for Imbalanced Data Learning (EFIDL) (2301.10888 - Li et al., 2023) in Methods — Machine Learning methods