PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications
Abstract: In practice, we are often faced with small-sized tabular data. However, current tabular benchmarks are not geared towards data-scarce applications, making it very difficult to derive meaningful conclusions from empirical comparisons. We introduce PMLBmini, a tabular benchmark suite of 44 binary classification datasets with sample sizes $\leq$ 500. We use our suite to thoroughly evaluate current automated machine learning (AutoML) frameworks, off-the-shelf tabular deep neural networks, as well as classical linear models in the low-data regime. Our analysis reveals that state-of-the-art AutoML and deep learning approaches often fail to appreciably outperform even a simple logistic regression baseline, but we also identify scenarios where AutoML and deep learning methods are indeed reasonable to apply. Our benchmark suite, available on https://github.com/RicardoKnauer/TabMini , allows researchers and practitioners to analyze their own methods and challenge their data efficiency.
- GPT-4 technical report. arXiv preprint arXiv:2303.08774.
- Autoprognosis: Automated clinical prognostic modeling via bayesian optimization with structured kernel learning. In International Conference on Machine Learning (ICML), pages 139–148. PMLR.
- Mfe: Towards reproducible meta-feature extraction. The Journal of Machine Learning Research, 21(1):4503–4507.
- Should we really use post-hoc tests based on mean-ranks? The Journal of Machine Learning Research, 17(1):152–161.
- Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8.
- OpenML benchmarking suites. In Proceedings of the NeurIPS 2021 Datasets and Benchmarks Track.
- Hyperfast: Instant classification for tabular data. arXiv preprint arXiv:2402.14335.
- Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems.
- Video generation models as world simulators.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of clinical epidemiology, 110:12–22.
- Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research, 7(1):1–30.
- Safe screening for logistic regression with l0-l2 regularization. arXiv, 2202.
- Jump: A modeling language for mathematical optimization. SIAM review, 59(2):295–320.
- Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505.
- Machine learning approaches applied in spinal pain research. Journal of Electromyography and Kinesiology, 61:102599.
- Scaling tabpfn: Sketching and feature selection for tabular prior-data fitted networks. In NeurIPS 2023 Second Table Representation Learning Workshop.
- Auto-sklearn 2.0: Hands-free automl via meta-learning. Journal of Machine Learning Research, 23(261):1–61.
- Efficient and robust automated machine learning. Advances in neural information processing systems, 28.
- Initializing bayesian hyperparameter optimization via meta-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
- OpenML-CTR23–a curated tabular regression benchmarking suite. In AutoML Conference 2023 (Workshop).
- AMLB: an AutoML benchmark. arXiv preprint arXiv:2207.12560.
- An open source AutoML benchmark. arXiv preprint arXiv:1907.00909.
- Tabpfn: A transformer that solves small tabular classification problems in a second. In Proceedings of the International Conference on Learning Representations (ICLR).
- Gittables: A large-scale corpus of relational tables. Proceedings of the ACM on Management of Data, 1(1):1–17.
- Autoprognosis 2.0: Democratizing diagnostic and prognostic modeling in healthcare with automated machine learning. PLOS Digital Health, 2(6):e0000276.
- Cost-sensitive best subset selection for logistic regression: A mixed-integer conic optimization perspective. In German Conference on Artificial Intelligence (Künstliche Intelligenz), pages 114–129. Springer.
- How complex is your classification problem? a survey on measuring classification complexity. ACM Computing Surveys (CSUR), 52(5):1–34.
- When do neural nets outperform boosted trees on tabular data? Advances in Neural Information Processing Systems, 36.
- Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Annals of internal medicine, 162(1):W1–W73.
- PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Annals of internal medicine, 170(1):W1–W33.
- MOSEK ApS (2023). MOSEK optimizer API for Python. Manual.
- Mothernet: A foundational hypernetwork for tabular classification. arXiv preprint arXiv:2312.08598.
- PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining, 10:1–13.
- Semi-automated diabetes prediction using autogluon and tabpfn models. In International Conference on Artificial Intelligence of Things, pages 289–295. Springer.
- Meta-learning for evolutionary parameter optimization of classifiers. Machine learning, 87:357–380.
- Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small. Journal of Clinical Epidemiology, 132:88–96.
- PMLB v1. 0: an open-source dataset collection for benchmarking machine learning methods. Bioinformatics, 38(3):878–880.
- Tabrepo: A large scale repository of tabular model evaluations and its automl applications. arXiv preprint arXiv:2311.02971.
- Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90.
- To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets. BMC Medical Research Methodology, 21:1–15.
- Steyerberg, E. W. (2019). Clinical prediction models: a practical approach to development, validation, and updating. Springer.
- Regression shrinkage methods for clinical prediction models do not guarantee improved performance: simulation study. Statistical methods in medical research, 29(11):3166–3178.
- OpenML: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2):49–60.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.