Papers
Topics
Authors
Recent
Search
2000 character limit reached

Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset

Published 24 Nov 2023 in cs.LG and cs.AI | (2311.14335v1)

Abstract: We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pre-processing of categorical and numerical features, and addressing substantial computational requirements. To overcome the identified challenges, the study conducts an extensive examination of various transformer-based models using both synthetic datasets and the default prediction Kaggle dataset (2022) from American Express. The paper presents crucial insights into optimal data pre-processing, compares pre-training and direct supervised learning methods, discusses strategies for managing categorical and numerical features, and highlights trade-offs between computational resources and performance. Focusing on temporal financial data modeling, the research aims to facilitate the systematic development and deployment of transformer-based models in real-world scenarios, emphasizing scalability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Deep learning approach on tabular data to predict early-onset neonatal sepsis. Journal of Information and Telecommunication 5, 2 (2021), 226–246.
  2. Song Chen. 2019. Beijing Multi-Site Air-Quality Data. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5RK5G.
  3. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. ACM. 785–794 pages. https://doi.org/10.1145/2939672.2939785
  4. TARNet: Task-Aware Reconstruction for Time-Series Transformer. (2022), 14–18.
  5. A gentle introduction to imputation of missing values. Journal of clinical epidemiology 59, 10 (2006), 1087–1091.
  6. CatBoost: gradient boosting with categorical features support. ArXiv preprint abs/1810.11363. https://arxiv.org/abs/1810.11363
  7. TabularNet: A neural network architecture for understanding semantic structures of tabular data. 322–331 pages.
  8. Chenguang Fang and Chen Wang. 2020. Time series data imputation: A survey on deep learning approaches. arXiv preprint arXiv:2011.11347 (2020).
  9. A survey of quantization methods for efficient neural network inference. ArXiv preprint abs/2103.13630 (2021). https://arxiv.org/abs/2103.13630
  10. Methods for handling missing data. Handbook of psychology (2003), 87–114.
  11. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, Carles Sierra (Ed.). ijcai.org, 1725–1731. https://doi.org/10.24963/ijcai.2017/239
  12. LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training. ArXiv preprint abs/2212.02691 (2022). https://arxiv.org/abs/2212.02691
  13. The Tree Ensemble Layer: Differentiability meets Conditional Computation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 4138–4148. http://proceedings.mlr.press/v119/hazimeh20a.html
  14. DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3-7, 2019, Wenwu Zhu, Dacheng Tao, Xueqi Cheng, Peng Cui, Elke A. Rundensteiner, David Carmel, Qi He, and Jeffrey Xu Yu (Eds.). ACM, 2129–2132. https://doi.org/10.1145/3357384.3358132
  15. Tabtransformer: Tabular data modeling using contextual embeddings. ArXiv preprint abs/2012.06678 (2020). https://arxiv.org/abs/2012.06678
  16. Prediction and Learning About Credit Card Spending. Available at SSRN 3172869 (2018).
  17. Well-tuned Simple Nets Excel on Tabular Datasets. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 23928–23941. https://proceedings.neurips.cc/paper/2021/hash/c902b497eb972281fb5b4e206db38ee6-Abstract.html
  18. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 3146–3154. https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
  19. TabNN: A universal neural network solution for tabular data. (2018).
  20. Assessing Beijing’s PM2. 5 pollution: severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 471, 2182 (2015), 20150257.
  21. Tune: A research platform for distributed model selection and training. ArXiv preprint abs/1807.05118 (2018). https://arxiv.org/abs/1807.05118
  22. Gated transformer networks for multivariate time series classification. ArXiv preprint abs/2103.14438 (2021). https://arxiv.org/abs/2103.14438
  23. One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data. ArXiv preprint abs/2302.06375 (2023). https://arxiv.org/abs/2302.06375
  24. Tabular transformers for modeling multivariate time series. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3565–3569.
  25. Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=r1eiu2VtwH
  26. Rishabh Rustogi and Ayush Prasad. 2019. Swift imbalance data classification using SMOTE and extreme learning machine. In 2019 International Conference on Computational Intelligence in Data Science (ICCIDS). IEEE, 1–6.
  27. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. ArXiv preprint abs/2106.01342 (2021). https://arxiv.org/abs/2106.01342
  28. Yunhao Zhang and Junchi Yan. 2023. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.