High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso (1202.0515v4)

Published 2 Feb 2012 in stat.ML, cs.AI, and stat.ME

Abstract: The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a feature-wise kernelized Lasso for capturing non-linear input-output dependency. We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments with thousands of features.

Citations (292)

View on Semantic Scholar

Summary

The paper introduces a novel feature-wise kernelized Lasso that leverages HSIC to capture non-linear dependencies in high-dimensional data.
It achieves efficient computation of the globally optimal solution using quadratic programming, ensuring scalability for complex datasets.
Extensive experiments on synthetic and real-world data validate its superiority over traditional Lasso methods in selecting relevant features.

High-Dimensional Feature Selection via Feature-Wise Kernelized Lasso

The paper proposes a novel approach to feature selection in high-dimensional datasets by introducing the concept of a feature-wise kernelized Lasso. This method aims to address the limitations of traditional Lasso techniques, which focus primarily on linear dependencies, by extending the framework to capture non-linear relationships through kernel methods, specifically utilizing the Hilbert-Schmidt Independence Criterion (HSIC).

Key Contributions

The authors present two main contributions:

Feature-Wise Kernelized Lasso: The paper introduces a method allowing for feature-wise application of non-linear transformations using kernel functions. This approach specifically utilizes HSIC as a metric to evaluate the statistical independence between features and output values, which enables the selection of features that capture strong non-linear dependencies. By doing so, this method circumvents the traditional limitation of Lasso, which struggles with non-linear relationships.
Efficient Computation and Scalability: The framework proposed allows for an efficient computation of the globally optimal solution, making it scalable to high-dimensional feature selection problems. This characteristic is especially significant given the growing complexity and data dimensionality in contemporary applications such as genomic data analysis and image classification.

Methodology

The paper prioritizes a tractable optimization problem by leveraging a quadratic programming approach that incorporates the kernel trick. This feature-wise formulation transforms the data into a feature space where Lasso can effectively operate. Additionally, the authors introduce the NOCCO Lasso as a variation, which seeks to reduce sensitivity to kernel choices by utilizing a different dependence measure.

Experimental Validation

The effectiveness of the proposed method is validated through experiments conducted on synthetic and real-world datasets, including classification tasks involving image datasets and regression problems with biological data. In these experiments, HSIC Lasso demonstrates favorable performance compared to existing methods like mRMR, QPFS, cKTA, and SpAM, particularly in selecting non-redundant features strongly associated with output values.

Implications and Future Directions

The introduction of HSIC-based feature selection opens new avenues for dealing with complex data structures encountered in high-dimensional spaces. The generality and flexibility of the method, as evidenced by its ability to be employed in structured outputs, point towards its applicability in various fields such as genomics and information retrieval. Furthermore, the paper serves as a foundation for future research to explore multi-task learning applications and extend the framework to accommodate more advanced non-linear models.

The paper paves the way for a deeper exploration of non-linear dependencies in feature selection using kernel methods, suggesting that such approaches could be expanded further to enhance the robustness and adaptability of feature selection processes in increasingly complex datasets. Future work may also focus on integrating these methods into full learning pipelines, further optimizing computational efficiency and exploring theoretical aspects of feature selection in kernel spaces.

PDF Markdown