Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Random Subspace with Trees for Feature Selection Under Memory Constraints (1709.01177v2)

Published 4 Sep 2017 in stat.ML and cs.LG

Abstract: Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Antonio Sutera (8 papers)
  2. Célia Châtel (1 paper)
  3. Gilles Louppe (68 papers)
  4. Louis Wehenkel (17 papers)
  5. Pierre Geurts (21 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.