Variable selection in functional data classification: a maxima-hunting proposal (1309.6697v3)
Abstract: Variable selection is considered in the setting of supervised binary classification with functional data ${X(t),\ t\in[0,1]}$. By "variable selection" we mean any dimension-reduction method which leads to replace the whole trajectory ${X(t),\ t\in[0,1]}$, with a low-dimensional vector $(X(t_1),\ldots,X(t_k))$ still keeping a similar classification error. Our proposal for variable selection is based on the idea of selecting the local maxima $(t_1,\ldots,t_k)$ of the function ${\mathcal V}_X2(t)={\mathcal V}2(X(t),Y)$, where ${\mathcal V}$ denotes the "distance covariance" association measure for random variables due to Sz\'ekely, Rizzo and Bakirov (2007). This method provides a simple natural way to deal with the relevance vs. redundancy trade-off which typically appears in variable selection. This paper includes (a) Some theoretical motivation: a result of consistent estimation on the maxima of ${\mathcal V}_X2$ is shown. We also show different theoretical models for the underlying process $X(t)$ under which the relevant information in concentrated in the maxima of ${\mathcal V}_X2$. (b) An extensive empirical study, including about 400 simulated models and real data examples, aimed at comparing our variable selection method with other standard proposals for dimension reduction.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.