Revisiting Deep Learning Models for Tabular Data (2106.11959v5)

Published 22 Jun 2021 in cs.LG

Abstract: The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports competitive results on various datasets. However, the proposed models are usually not properly compared to each other and existing works often use different benchmarks and experiment protocols. As a result, it is unclear for both researchers and practitioners what models perform best. Additionally, the field still lacks effective baselines, that is, the easy-to-use models that provide competitive performance across different problems. In this work, we perform an overview of the main families of DL architectures for tabular data and raise the bar of baselines in tabular DL by identifying two simple and powerful deep architectures. The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works. The second model is our simple adaptation of the Transformer architecture for tabular data, which outperforms other solutions on most tasks. Both models are compared to many existing architectures on a diverse set of tasks under the same training and tuning protocols. We also compare the best DL models with Gradient Boosted Decision Trees and conclude that there is still no universally superior solution.

Citations (566)

View on Semantic Scholar

Summary

The paper establishes that a simple ResNet-like model serves as a robust baseline for tabular data tasks.
It systematically compares DL models against GBDT across 11 public datasets, revealing performance variations based on task characteristics.
Results highlight the FT-Transformer’s robust performance and adaptability as a promising solution for diverse tabular challenges.

Revisiting Deep Learning Models for Tabular Data

The paper "Revisiting Deep Learning Models for Tabular Data" presents a comprehensive paper on the application and evaluation of deep learning (DL) approaches to tabular data tasks. Despite the widespread success of DL in domains such as image processing and natural language processing, its effectiveness on tabular datasets has been less clear. This paper aims to address these ambiguities by systematically evaluating key DL architectures and benchmarking them against traditional Gradient Boosted Decision Trees (GBDT).

Key Contributions

Evaluation of DL Models: The authors assess a wide variety of DL models for tabular data, highlighting two significant architectures: a ResNet-like model and the FT-Transformer. The former is noted as an effective baseline often overlooked in previous studies, while the latter showcases superior performance in many tasks.
Comparison to GBDT: The paper rigorously compares DL models to well-established GBDT approaches, such as XGBoost and CatBoost. The findings indicate that there is no universally superior model when contrasting DL with GBDT. The performance varies significantly based on the dataset and task.
Simple and Effective Baselines: By identifying a simple ResNet architecture as an effective baseline, the paper provides a reference point for future DL research in tabular contexts. Additionally, the FT-Transformer's impressive results suggest it as a robust solution across diverse tasks.
Synthetic Task Analysis: Further analysis with synthetic datasets reveals that the FT-Transformer exhibits more universal adaptability to a broader class of problems, performing consistently well across varying conditions where other models might struggle.

Numerical Results

The paper's evaluation across eleven public datasets reveals the FT-Transformer outperforms existing DL solutions in most cases, attesting to its robustness. Moreover, while GBDT models occasionally surpassed DL models in specific datasets, the FT-Transformer minimized these gaps significantly, establishing it as a versatile model.

Implications and Future Directions

This work underscores the necessity for consistent benchmarks in evaluating DL architectures for tabular data, akin to ImageNet or GLUE for other domains. The highlighted architectures, especially the FT-Transformer, indicate potential directions for future research, emphasizing the integration of attention mechanisms in tabular data processing.

The continual evolution of DL models tailored for tabular data could further bridge the gap with GBDT models, possibly adapting advanced techniques like efficient attention mechanisms to reduce computational overhead. Furthermore, better hyperparameter tuning spaces and efficiency in training might offer significant improvements.

Conclusion

The paper methodically dissects the landscape of DL for tabular data, proposing strong baselines that could steer future research efforts. While the ResNet-like model and FT-Transformer offer substantial performances, the lack of a panacea among DL and GBDT solutions encourages ongoing refinement and exploration in architectural innovations and optimization strategies in handling tabular datasets.

PDF Markdown

Related Papers

GitHub

GitHub - yandex-research/rtdl-revisiting-models: (NeurIPS 2021) Revisiting Deep Learning Models for Tabular Data (271 stars)

Tweets

https://twitter.com/bozavlado/status/1790413084206514380

YouTube

Show All Videos