A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data
Abstract: Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of model-centric evaluation setups with overly standardized data preprocessing. This paper demonstrates that such model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering. Therefore, we propose a data-centric evaluation framework. We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset. We conduct experiments with different preprocessing pipelines and hyperparameter optimization (HPO) regimes to quantify the impact of model selection, HPO, feature engineering, and test-time adaptation. Our main findings are: 1. After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces. 2. Recent models, despite their measurable progress, still significantly benefit from manual feature engineering. This holds true for both tree-based models and neural networks. 3. While tabular data is typically considered static, samples are often collected over time, and adapting to distribution shifts can be important even in supposedly static data. These insights suggest that research efforts should be directed toward a data-centric perspective, acknowledging that tabular data requires feature engineering and often exhibits temporal characteristics. Our framework is available under: https://github.com/atschalz/dc_tabeval.
- Porto seguro’s safe driver prediction.
- Ieee-cis fraud detection.
- M5 forecasting - accuracy.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631.
- Mercedes-benz greener manufacturing.
- Bnp paribas cardif claims management.
- Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 6679–6687.
- Amazon.com - employee access challenge.
- Otto group product classification challenge.
- Algorithms for hyper-parameter optimization. Advances in neural information processing systems, 24.
- Openml benchmarking suites.
- Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and electronics in agriculture, 24(3):131–151.
- Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems.
- Bromberg, B. (1942). The origin of banking: religious finance in babylonia. The Journal of Economic History, 2(1):77–88.
- Arm-net: Adaptive relation modeling network for structured data. In Proceedings of the 2021 International Conference on Management of Data, pages 207–220.
- Cao, L. (2022). Ai in finance: challenges, techniques, and opportunities. ACM Computing Surveys (CSUR), 55(3):1–38.
- Danets: Deep abstract networks for tabular data classification and regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 3930–3938.
- Excelformer: A neural network surpassing gbdts on tabular data. arXiv preprint arXiv:2301.02819.
- Trompt: Towards a better deep neural network for tabular data. arXiv preprint arXiv:2305.18446.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794.
- A performance-driven benchmark for feature selection in tabular deep learning. Advances in Neural Information Processing Systems, 36.
- Cramer, J. S. (2002). The origins of logistic regression.
- Homesite quote conversion.
- A boosting-like online learning ensemble. In 2016 international joint conference on neural networks (IJCNN), pages 1871–1878. IEEE.
- Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505.
- Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning, pages 1437–1446. PMLR.
- Openml-ctr23–a curated tabular regression benchmarking suite. In AutoML Conference 2023 (Workshop).
- Benchmarking distribution shift in tabular data with tableshift. Advances in Neural Information Processing Systems, 36.
- Amlb: an automl benchmark. arXiv preprint arXiv:2207.12560.
- Amlb: an automl benchmark. Journal of Machine Learning Research, 25(101):1–65.
- An open source automl benchmark. arXiv preprint arXiv:1907.00909.
- On embeddings for numerical features in tabular deep learning. Advances in Neural Information Processing Systems, 35:24991–25004.
- Tabr: Unlocking the power of retrieval-augmented tabular deep learning. arXiv preprint arXiv:2307.14338.
- Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34:18932–18943.
- Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems, 35:507–520.
- Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247.
- Tabgnn: Multiplex graph neural network for tabular data prediction. arXiv preprint arXiv:2108.09127.
- Tabpfn: A transformer that solves small tabular classification problems in a second. arXiv preprint arXiv:2207.01848.
- Large language models for automated data science: Introducing caafe for context-aware automated feature engineering. Advances in Neural Information Processing Systems, 36.
- Tabtransformer: Tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9.
- Kaggle (2024). Kaggle: Your home for data science. https://www.kaggle.com. Accessed: 2024-06-02.
- Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.
- Deepgbm: A deep learning framework distilled by gbdt for online prediction tasks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 384–394.
- Adaptable: Test-time adaptation for tabular data via shift-aware uncertainty calibrator and label distribution handler.
- Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv preprint arXiv:1905.04970.
- Towards quantifying the effect of datasets for benchmarking: A look at tabular machine learning.
- Self-attention between datapoints: Going beyond individual input-output pairs in deep learning. Advances in Neural Information Processing Systems, 34:28742–28756.
- Balancing discriminability and transferability for source-free domain adaptation. In International conference on machine learning, pages 11710–11728. PMLR.
- Confidence score for source-free unsupervised domain adaptation. In International conference on machine learning, pages 12365–12377. PMLR.
- Domain generalization with adversarial feature learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5400–5409.
- Tnt: An interpretable tree-network-tree learning framework using knowledge distillation. Entropy, 22(11):1203.
- xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1754–1763.
- An explainable knowledge distillation method with xgboost for icu mortality prediction. Computers in Biology and Medicine, 152:106466.
- Ttt++: When does self-supervised test-time training fail or thrive? Advances in Neural Information Processing Systems, 34:21808–21820.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Network on network for tabular data classification in real-world applications. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2317–2326.
- Shifts: A dataset of real distributional shift across multiple large-scale tasks. arXiv preprint arXiv:2107.07455.
- Finalmlp: An enhanced two-stream mlp model for ctr prediction. arXiv preprint arXiv:2304.00902.
- Santander value prediction challenge.
- Grande: Gradient-based decision tree ensembles. arXiv preprint arXiv:2309.17130.
- A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5:115–133.
- When do neural nets outperform boosted trees on tabular data? arXiv preprint arXiv:2305.02997.
- Santander customer transaction prediction.
- Mothernet: A foundational hypernetwork for tabular classification. arXiv preprint arXiv:2312.08598.
- Efficient test-time model adaptation without forgetting. In International conference on machine learning, pages 16888–16905. PMLR.
- Towards stable test-time adaptation in dynamic wild world. arXiv preprint arXiv:2302.12400.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830.
- Neural oblivious decision ensembles for deep learning on tabular data. arXiv preprint arXiv:1909.06312.
- Catboost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.
- A meta-analysis of overfitting in machine learning. Advances in Neural Information Processing Systems, 32.
- Learning representations by back-propagating errors. nature, 323(6088):533–536.
- Dynaq: online learning from imbalanced multi-class streams through dynamic sampling. Applied Intelligence, 53(21):24908–24930.
- Benchmarking multimodal automl for tabular data with text fields. arXiv preprint arXiv:2111.02705.
- Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90.
- Test: Test-time self-training under distribution shift. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2759–2769.
- Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342.
- Soraya Jimenez, W. C. (2016). Santander customer satisfaction.
- Supertml: Two-dimensional word embedding for the precognition on structured tabular data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0.
- Test-time training with self-supervision for generalization under distribution shifts. In International conference on machine learning, pages 9229–9248. PMLR.
- 2023 kaggle ai report.
- Van Dalen, B. (1993). Ancient and Mediaeval Astronomical Tables: mathematical structure and parameter values. Universiteit Utrecht, Faculteit Wiskunde en Informatica.
- Transtab: Learning transferable tabular transformers across tables. Advances in Neural Information Processing Systems, 35:2902–2915.
- Adversarial domain adaptation with domain mixup. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 6502–6509.
- T2g-former: organizing tabular features into relation graphs promotes heterogeneous feature interaction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10720–10728.
- Improve unsupervised domain adaptation with mixup training. arXiv preprint arXiv:2001.00677.
- Openfe: Automated feature generation with expert-level performance. In International Conference on Machine Learning, pages 41880–41901. PMLR.
- A systematic review on data of additive manufacturing for machine learning applications: the data quality, type, preprocessing, and management. Journal of Intelligent Manufacturing, 34(8):3305–3340.
- On pitfalls of test-time adaptation. In International Conference on Machine Learning, pages 42058–42080. PMLR.
- Table2graph: Transforming tabular data to unified weighted graph. In IJCAI, pages 2420–2426.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415.
- Converting tabular data into images for deep learning with convolutional neural networks. Scientific reports, 11(1):11325.
- Zindi (2024). https://zindi.africa. Accessed: 2024-06-11.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.