Measuring Feature-Label Dependence Using Projection Correlation Statistic (2504.19180v2)
Abstract: Detecting dependence between variables is a crucial issue in statistical science. In this paper, we propose a novel metric called label projection correlation to measure the dependence between numerical and categorical variables. The proposed correlation does not require any conditions on numerical variables, and it is equal to zero if and only if the two variables are independent. When the numerical variable is one-dimensional, we demonstrate that the computational cost of the correlation estimation can be reduced to $\mathcal{O}(n \log n)$, where $ n $ is the sample size. Additionally, if the one-dimensional variable is continuous, the correlation can be simplified to a concise rank-based expression. The asymptotic theorems of the estimation are also established. Two simulated experiments are presented to demonstrate the effectiveness of the proposed correlation in feature selection. Furthermore, the metric is applied to feature selection in drivers' facial images and cancer mass-spectrometric data.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.