Papers
Topics
Authors
Recent
Search
2000 character limit reached

PMLBmini: A Tabular Classification Benchmark Suite for Data-Scarce Applications

Published 3 Sep 2024 in cs.LG and cs.AI | (2409.01635v1)

Abstract: In practice, we are often faced with small-sized tabular data. However, current tabular benchmarks are not geared towards data-scarce applications, making it very difficult to derive meaningful conclusions from empirical comparisons. We introduce PMLBmini, a tabular benchmark suite of 44 binary classification datasets with sample sizes $\leq$ 500. We use our suite to thoroughly evaluate current automated machine learning (AutoML) frameworks, off-the-shelf tabular deep neural networks, as well as classical linear models in the low-data regime. Our analysis reveals that state-of-the-art AutoML and deep learning approaches often fail to appreciably outperform even a simple logistic regression baseline, but we also identify scenarios where AutoML and deep learning methods are indeed reasonable to apply. Our benchmark suite, available on https://github.com/RicardoKnauer/TabMini , allows researchers and practitioners to analyze their own methods and challenge their data efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. GPT-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Autoprognosis: Automated clinical prognostic modeling via bayesian optimization with structured kernel learning. In International Conference on Machine Learning (ICML), pages 139–148. PMLR.
  3. Mfe: Towards reproducible meta-feature extraction. The Journal of Machine Learning Research, 21(1):4503–4507.
  4. Should we really use post-hoc tests based on mean-ranks? The Journal of Machine Learning Research, 17(1):152–161.
  5. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8.
  6. OpenML benchmarking suites. In Proceedings of the NeurIPS 2021 Datasets and Benchmarks Track.
  7. Hyperfast: Instant classification for tabular data. arXiv preprint arXiv:2402.14335.
  8. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems.
  9. Video generation models as world simulators.
  10. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  11. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of clinical epidemiology, 110:12–22.
  12. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research, 7(1):1–30.
  13. Safe screening for logistic regression with l0-l2 regularization. arXiv, 2202.
  14. Jump: A modeling language for mathematical optimization. SIAM review, 59(2):295–320.
  15. Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505.
  16. Machine learning approaches applied in spinal pain research. Journal of Electromyography and Kinesiology, 61:102599.
  17. Scaling tabpfn: Sketching and feature selection for tabular prior-data fitted networks. In NeurIPS 2023 Second Table Representation Learning Workshop.
  18. Auto-sklearn 2.0: Hands-free automl via meta-learning. Journal of Machine Learning Research, 23(261):1–61.
  19. Efficient and robust automated machine learning. Advances in neural information processing systems, 28.
  20. Initializing bayesian hyperparameter optimization via meta-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29.
  21. OpenML-CTR23–a curated tabular regression benchmarking suite. In AutoML Conference 2023 (Workshop).
  22. AMLB: an AutoML benchmark. arXiv preprint arXiv:2207.12560.
  23. An open source AutoML benchmark. arXiv preprint arXiv:1907.00909.
  24. Tabpfn: A transformer that solves small tabular classification problems in a second. In Proceedings of the International Conference on Learning Representations (ICLR).
  25. Gittables: A large-scale corpus of relational tables. Proceedings of the ACM on Management of Data, 1(1):1–17.
  26. Autoprognosis 2.0: Democratizing diagnostic and prognostic modeling in healthcare with automated machine learning. PLOS Digital Health, 2(6):e0000276.
  27. Cost-sensitive best subset selection for logistic regression: A mixed-integer conic optimization perspective. In German Conference on Artificial Intelligence (Künstliche Intelligenz), pages 114–129. Springer.
  28. How complex is your classification problem? a survey on measuring classification complexity. ACM Computing Surveys (CSUR), 52(5):1–34.
  29. When do neural nets outperform boosted trees on tabular data? Advances in Neural Information Processing Systems, 36.
  30. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Annals of internal medicine, 162(1):W1–W73.
  31. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Annals of internal medicine, 170(1):W1–W33.
  32. MOSEK ApS (2023). MOSEK optimizer API for Python. Manual.
  33. Mothernet: A foundational hypernetwork for tabular classification. arXiv preprint arXiv:2312.08598.
  34. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining, 10:1–13.
  35. Semi-automated diabetes prediction using autogluon and tabpfn models. In International Conference on Artificial Intelligence of Things, pages 289–295. Springer.
  36. Meta-learning for evolutionary parameter optimization of classifiers. Machine learning, 87:357–380.
  37. Penalization and shrinkage methods produced unreliable clinical prediction models especially when sample size was small. Journal of Clinical Epidemiology, 132:88–96.
  38. PMLB v1. 0: an open-source dataset collection for benchmarking machine learning methods. Bioinformatics, 38(3):878–880.
  39. Tabrepo: A large scale repository of tabular model evaluations and its automl applications. arXiv preprint arXiv:2311.02971.
  40. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90.
  41. To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets. BMC Medical Research Methodology, 21:1–15.
  42. Steyerberg, E. W. (2019). Clinical prediction models: a practical approach to development, validation, and updating. Springer.
  43. Regression shrinkage methods for clinical prediction models do not guarantee improved performance: simulation study. Statistical methods in medical research, 29(11):3166–3178.
  44. OpenML: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2):49–60.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

GitHub