Making Pre-trained Language Models Great on Tabular Prediction (2403.01841v2)
Abstract: The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. However, due to the heterogeneity among tables, such DNN bonus is still far from being well exploited on tabular data prediction (e.g., regression or classification tasks). Condensing knowledge from diverse domains, LLMs (LMs) possess the capability to comprehend feature names from various tables, potentially serving as versatile learners in transferring knowledge across distinct tables and diverse prediction tasks, but their discrete text representation space is inherently incompatible with numerical feature values in tables. In this paper, we present TP-BERTa, a specifically pre-trained LM for tabular data prediction. Concretely, a novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names. Comprehensive experiments demonstrate that our pre-trained TP-BERTa leads the performance among tabular DNNs and is competitive with Gradient Boosted Decision Tree models in typical tabular data regime.
- Optuna: A next-generation hyperparameter optimization framework. In KDD, 2019.
- TabNet: Attentive interpretable tabular learning. In AAAI, 2021.
- Machine learning in finance: A topic modeling approach. European Financial Management, 2022.
- Deep neural networks and tabular data: A survey. TNNLS, 2022a.
- Language models are realistic tabular data generators. In ICLR, 2022b.
- Language models are few-shot learners. In NeurIPS, 2020.
- TabCaps: A capsule neural network for tabular data classification with BoW routing. In ICLR, 2023.
- XGBoost: A scalable tree boosting system. In KDD, 2016.
- BERT: Pre-training of deep bidirectional Transformers for language understanding. In NAACL, 2018.
- Supervised and unsupervised discretization of continuous features. In ICML, 1995.
- Making pre-trained language models better few-shot learners. In ACL, pp. 3816–3830, 2021.
- Revisiting deep learning models for tabular data. In NeurIPS, 2021.
- On embeddings for numerical features in tabular deep learning. In NeurIPS, 2022.
- Why do tree-based models still outperform deep learning on typical tabular data? In NeurIPS, 2022.
- A machine learning approach for prediction of pregnancy outcome following IVF treatment. Neural Computing and Applications, 2020.
- Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
- How can we know what language models know? TACL, 8:423–438, 2020.
- Error-based and entropy-based discretization of continuous features. In KDD, 1996.
- RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Decoupled weight decay regularization. In ICLR, 2018.
- Language models as knowledge bases? In EMNLP, pp. 2463–2473, 2019.
- Neural oblivious decision ensembles for deep learning on tabular data. In ICLR, 2019.
- CatBoost: Unbiased boosting with categorical features. In NeurIPS, 2018.
- Limitations of language models in arithmetic and symbolic induction. In ACL, 2023.
- Language models are unsupervised multitask learners. OpenAI Blog, 2019.
- SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training. In NeurIPS, 2022.
- Weiping Song et al. AutoInt: Automatic feature interaction learning via self-attentive neural networks. In CIKM, 2019.
- DCN V2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In WWW, 2021.
- TransTab: Learning transferable tabular Transformers across tables. In NeurIPS, 2022.
- Self-training with noisy student improves ImageNet classification. In CVPR, 2020.
- T2G-Former: Organizing tabular features into relation graphs promotes heterogeneous feature interaction. In AAAI, 2023.
- CT-BERT: Learning better tabular representations through cross-table pre-training. arXiv preprint arXiv:2307.04308, 2023.
- Tablegpt: Towards unifying tables, nature language and commands into one gpt. arXiv preprint arXiv:2307.08674, 2023.
- Generative table pre-training empowers models for tabular prediction. arXiv preprint arXiv:2305.09696, 2023.
- XTab: Cross-table pretraining for tabular Transformers. In ICML, 2023.