Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automated Model Selection for Tabular Data (2401.00961v2)

Published 1 Jan 2024 in cs.LG and cs.AI

Abstract: Structured data in the form of tabular datasets contain features that are distinct and discrete, with varying individual and relative importances to the target. Combinations of one or more features may be more predictive and meaningful than simple individual feature contributions. R's mixed effect linear models library allows users to provide such interactive feature combinations in the model design. However, given many features and possible interactions to select from, model selection becomes an exponentially difficult task. We aim to automate the model selection process for predictions on tabular datasets incorporating feature interactions while keeping computational costs small. The framework includes two distinct approaches for feature selection: a Priority-based Random Grid Search and a Greedy Search method. The Priority-based approach efficiently explores feature combinations using prior probabilities to guide the search. The Greedy method builds the solution iteratively by adding or removing features based on their impact. Experiments on synthetic demonstrate the ability to effectively capture predictive feature combinations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (8)
  1. Uci machine learning repository, 2017. URL http://archive. ics. uci. edu/ml, 7(1).
  2. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. CoRR, abs/1803.05170.
  3. DNN2LR: interpretation-inspired feature crossing for real-world tabular data. CoRR, abs/2008.09775.
  4. Retrieval & interaction machine for tabular data prediction. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, page 1379–1389, New York, NY, USA. Association for Computing Machinery.
  5. Including Multi-feature Interactions and Redundancy for Feature Ranking in Mixed Datasets, pages 239–255.
  6. Interactive feature generation via learning adjacency tensor of feature graph. CoRR, abs/2007.14573.
  7. A novel feature selection method considering feature interaction. Pattern Recognition, 48(8):2656–2666.
  8. Searching for interacting features in subset selection. Intelligent Data Analysis, 13(2):207–228.
Citations (1)

Summary

We haven't generated a summary for this paper yet.