Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset
Abstract: We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pre-processing of categorical and numerical features, and addressing substantial computational requirements. To overcome the identified challenges, the study conducts an extensive examination of various transformer-based models using both synthetic datasets and the default prediction Kaggle dataset (2022) from American Express. The paper presents crucial insights into optimal data pre-processing, compares pre-training and direct supervised learning methods, discusses strategies for managing categorical and numerical features, and highlights trade-offs between computational resources and performance. Focusing on temporal financial data modeling, the research aims to facilitate the systematic development and deployment of transformer-based models in real-world scenarios, emphasizing scalability.
- Deep learning approach on tabular data to predict early-onset neonatal sepsis. Journal of Information and Telecommunication 5, 2 (2021), 226–246.
- Song Chen. 2019. Beijing Multi-Site Air-Quality Data. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5RK5G.
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. ACM. 785–794 pages. https://doi.org/10.1145/2939672.2939785
- TARNet: Task-Aware Reconstruction for Time-Series Transformer. (2022), 14–18.
- A gentle introduction to imputation of missing values. Journal of clinical epidemiology 59, 10 (2006), 1087–1091.
- CatBoost: gradient boosting with categorical features support. ArXiv preprint abs/1810.11363. https://arxiv.org/abs/1810.11363
- TabularNet: A neural network architecture for understanding semantic structures of tabular data. 322–331 pages.
- Chenguang Fang and Chen Wang. 2020. Time series data imputation: A survey on deep learning approaches. arXiv preprint arXiv:2011.11347 (2020).
- A survey of quantization methods for efficient neural network inference. ArXiv preprint abs/2103.13630 (2021). https://arxiv.org/abs/2103.13630
- Methods for handling missing data. Handbook of psychology (2003), 87–114.
- DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, Carles Sierra (Ed.). ijcai.org, 1725–1731. https://doi.org/10.24963/ijcai.2017/239
- LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training. ArXiv preprint abs/2212.02691 (2022). https://arxiv.org/abs/2212.02691
- The Tree Ensemble Layer: Differentiability meets Conditional Computation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 4138–4148. http://proceedings.mlr.press/v119/hazimeh20a.html
- DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3-7, 2019, Wenwu Zhu, Dacheng Tao, Xueqi Cheng, Peng Cui, Elke A. Rundensteiner, David Carmel, Qi He, and Jeffrey Xu Yu (Eds.). ACM, 2129–2132. https://doi.org/10.1145/3357384.3358132
- Tabtransformer: Tabular data modeling using contextual embeddings. ArXiv preprint abs/2012.06678 (2020). https://arxiv.org/abs/2012.06678
- Prediction and Learning About Credit Card Spending. Available at SSRN 3172869 (2018).
- Well-tuned Simple Nets Excel on Tabular Datasets. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 23928–23941. https://proceedings.neurips.cc/paper/2021/hash/c902b497eb972281fb5b4e206db38ee6-Abstract.html
- LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 3146–3154. https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
- TabNN: A universal neural network solution for tabular data. (2018).
- Assessing Beijing’s PM2. 5 pollution: severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 471, 2182 (2015), 20150257.
- Tune: A research platform for distributed model selection and training. ArXiv preprint abs/1807.05118 (2018). https://arxiv.org/abs/1807.05118
- Gated transformer networks for multivariate time series classification. ArXiv preprint abs/2103.14438 (2021). https://arxiv.org/abs/2103.14438
- One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data. ArXiv preprint abs/2302.06375 (2023). https://arxiv.org/abs/2302.06375
- Tabular transformers for modeling multivariate time series. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3565–3569.
- Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=r1eiu2VtwH
- Rishabh Rustogi and Ayush Prasad. 2019. Swift imbalance data classification using SMOTE and extreme learning machine. In 2019 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, 1–6.
- Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. ArXiv preprint abs/2106.01342 (2021). https://arxiv.org/abs/2106.01342
- Yunhao Zhang and Junchi Yan. 2023. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.