Comparison study of variable selection procedures in high-dimensional Gaussian linear regression (2109.12006v3)
Abstract: We propose an extensive simulation study to compare some variable selection procedures in a high-dimensional framework. Assuming that the relationship between the actives variables and the response variable is linear, the high-dimensional Gaussian linear regression provides a relevant statistical framework to identify active variables related to the response variable. Many variable selection procedures exist, and in this article, we focus on methods based on regularization paths. We perform a comparison study by considering different simulation settings with various dependency structures for variables and evaluate the performance of the methods by computing several metrics. As expected, no method is optimal for all the evaluated performances but we provide recommendations for the best procedures according to the metric to control. Lastly, we test the importance of some assumptions of the model, especially the high dimensionality and the Gaussian ones.