Gradient Boosted Feature Selection (1901.04055v1)

Published 13 Jan 2019 in cs.LG and stat.ML

Abstract: A feature selection algorithm should ideally satisfy four conditions: reliably extract relevant features; be able to identify non-linear feature interactions; scale linearly with the number of features and dimensions; allow the incorporation of known sparsity structure. In this work we propose a novel feature selection algorithm, Gradient Boosted Feature Selection (GBFS), which satisfies all four of these requirements. The algorithm is flexible, scalable, and surprisingly straight-forward to implement as it is based on a modification of Gradient Boosted Trees. We evaluate GBFS on several real world data sets and show that it matches or out-performs other state of the art feature selection algorithms. Yet it scales to larger data set sizes and naturally allows for domain-specific side information.

Citations (166)

View on Semantic Scholar

Summary

A Detailed Exploration of Gradient Boosted Feature Selection

The paper "Gradient Boosted Feature Selection" by Zhixiang Xu and colleagues addresses a critical challenge in the field of machine learning: efficient and effective feature selection. The proposed Gradient Boosted Feature Selection (GBFS) algorithm leverages the advancements in gradient boosting to synergize feature selection with classification, thus optimizing both tasks in a unified framework.

Key Contributions and Methodology

The authors outline the deficiencies of existing feature selection techniques, particularly linear approaches like Lasso that fall short in capturing nonlinear feature interactions. They highlight the need for an algorithm that is versatile, scalable, and capable of incorporating domain-specific knowledge in the form of structured sparsity. GBFS is presented as a robust solution satisfying these criteria. It builds on gradient boosted regression trees, traditionally used for classification, by integrating a mechanism for feature selection.

GBFS intelligently modifies the impurity function in gradient boosting to penalize the selection of novel features, thereby encouraging repetitive use of previously selected features. This adjustment uniquely enables GBFS to maintain computational efficiency, scaling linearly with both the number of features and data dimensionality. The algorithm also accommodates pre-existing patterns of sparsity or feature groups, allowing researchers to enforce constraints that reflect real-world intricacies or domain knowledge, such as biological grouping of genes.

Experimental Evaluation

The empirical results showcased in the paper demonstrate GBFS's competitiveness against leading feature selection methods. Across multiple datasets, including those with high feature dimensionality, GBFS either matched or surpassed the classification accuracy of more traditional methods like Random Forest Feature Selection (RF-FS) and HSIC Lasso. Particularly noteworthy is GBFS's ability to generate parsimonious models without sacrificing performance, a significant advantage in scenarios demanding real-time or near-real-time decision-making.

The paper on the Colon data set, where the algorithm had access to structured feature groups, emphasizes GBFS's strength in leveraging domain-specific constraints to produce insightful and interpretable models. This capability aligns with ongoing trends in machine learning towards achieving not just predictive accuracy but also model transparency and interpretability.

Implications and Future Directions

GBFS introduces a paradigm shift in feature selection, seamlessly integrating it with the model training process. This integration suggests potential applications across domains such as bioinformatics, computer vision, and medical imaging, where feature relevance is often intertwined with domain knowledge and inherent data structures.

Looking ahead, the paper suggests several avenues for future research, such as enhancing the algorithm's scalability to accommodate even larger numbers of features and datasets. Furthermore, refining the feature selection criterion to potentially include stochastic elements or incorporating recent advances in tree-based models could expand the utility and flexibility of GBFS.

In conclusion, "Gradient Boosted Feature Selection" offers a comprehensive framework that effectively addresses the longstanding challenge of feature selection in non-linear contexts, underlining the continuous interplay between methodological innovation and practical applicability in machine learning research. The adaptability of GBFS to incorporate structured sparsity is particularly promising, suggesting that this approach could pave the way for more tailored and context-aware machine learning solutions across a spectrum of complex, high-dimensional datasets.

Gradient Boosted Feature Selection (1901.04055v1)

Summary

A Detailed Exploration of Gradient Boosted Feature Selection

Key Contributions and Methodology

Experimental Evaluation

Implications and Future Directions

Follow-up Questions

Authors (4)

Gradient Boosted Feature Selection (1901.04055v1)

Summary

A Detailed Exploration of Gradient Boosted Feature Selection

Key Contributions and Methodology

Experimental Evaluation

Implications and Future Directions

Follow-up Questions

Related Papers

Authors (4)