Logistic lasso regression with nearest neighbors for gradient-based dimension reduction (2407.08485v2)
Abstract: This paper investigates a new approach to estimate the gradient of the conditional probability given the covariates in the binary classification framework. The proposed approach consists in fitting a localized nearest-neighbor logistic model with $\ell_1$-penalty in order to cope with possibly high-dimensional covariates. Our theoretical analysis shows that the pointwise convergence rate of the gradient estimator is optimal under very mild conditions. Moreover, using an outer product of such gradient estimates at several points in the covariate space, we establish the rate of convergence for estimating the so-called central subspace, a well-known object allowing to carry out dimension reduction within the covariate space. Our implementation uses cross-validation on the misclassification rate to estimate the dimension of this subspace. We find that the proposed approach outperforms existing competitors in synthetic and real data applications.