Nonparametric Linear Feature Learning in Regression Through Regularisation (2307.12754v4)
Abstract: Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would greatly enhance prediction, computation, and interpretation. To address this challenge, we propose a novel method for joint linear feature learning and non-parametric function estimation, aimed at more effectively leveraging hidden features for learning. Our approach employs empirical risk minimisation, augmented with a penalty on function derivatives, ensuring versatility. Leveraging the orthogonality and rotation invariance properties of Hermite polynomials, we introduce our estimator, named RegFeaL. By using alternative minimisation, we iteratively rotate the data to improve alignment with leading directions. We establish that the expected risk of our method converges in high-probability to the minimal risk under minimal assumptions and with explicit rates. Additionally, we provide empirical results demonstrating the performance of RegFeaL in various experiments.
- {barticle}[author] \bauthor\bsnmAntoniadis, \bfnmA.\binitsA., \bauthor\bsnmLambert-Lacroix, \bfnmS.\binitsS. and \bauthor\bsnmLeblanc, \bfnmF.\binitsF. (\byear2003). \btitleEffective dimension reduction methods for tumor classification using gene expression data. \bjournalBioinformatics \bvolume19 \bpages563-570. \bdoi10.1093/bioinformatics/btg062 \endbibitem
- {barticle}[author] \bauthor\bsnmAronszajn, \bfnmN.\binitsN. (\byear1950). \btitleTheory of reproducing kernels. \bjournalTransactions of the American Mathematical Society \bvolume68 \bpages337–404. \endbibitem
- {barticle}[author] \bauthor\bsnmBabichev, \bfnmDmitry\binitsD. and \bauthor\bsnmBach, \bfnmFrancis\binitsF. (\byear2018). \btitleSlice inverse regression with score functions. \bjournalElectronic Journal of Statistics \bvolume12 \bpages1507-1543. \bdoi10.1214/18-EJS1428 \endbibitem
- {bbook}[author] \bauthor\bsnmBach, \bfnmFrancis\binitsF. (\byear2023). \btitleLearning Theory from First Principles (draft). \bpublisherMIT Press. \endbibitem
- {barticle}[author] \bauthor\bsnmBartlett, \bfnmPeter L.\binitsP. L. and \bauthor\bsnmMendelson, \bfnmShahar\binitsS. (\byear2002). \btitleRademacher and Gaussian complexities: Risk bounds and structural results. \bjournalJournal of Machine Learning Research \bvolume3 \bpages463–482. \endbibitem
- {bbook}[author] \bauthor\bsnmBoucheron, \bfnmS.\binitsS., \bauthor\bsnmLugosi, \bfnmG.\binitsG. and \bauthor\bsnmMassart, \bfnmP.\binitsP. (\byear2013). \btitleConcentration Inequalities: A Nonasymptotic Theory of Independence. \bpublisherOUP Oxford. \endbibitem
- {barticle}[author] \bauthor\bsnmBrillinger, \bfnmDavid\binitsD. (\byear2012). \btitleA generalized linear model with “Gaussian” regressor variables. \bjournalA Festschrift for Erich L. Lehmann. \bdoi10.1007/978-1-4614-1344-8_34 \endbibitem
- {barticle}[author] \bauthor\bsnmDalalyan, \bfnmArnak S.\binitsA. S., \bauthor\bsnmJuditsky, \bfnmAnatoli B.\binitsA. B. and \bauthor\bsnmSpokoiny, \bfnmVladimir\binitsV. (\byear2008). \btitleA new algorithm for estimating the effective dimension-reduction subspace. \bjournalJournal of Machine Learning Research \bvolume9 \bpages1647-1678. \endbibitem
- {barticle}[author] \bauthor\bsnmFriedman, \bfnmJerome H.\binitsJ. H. (\byear1991). \btitleMultivariate adaptive regression splines. \bjournalThe Annals of Statistics \bvolume19 \bpages1–67. \endbibitem
- {barticle}[author] \bauthor\bsnmFukumizu, \bfnmKenji\binitsK., \bauthor\bsnmBach, \bfnmFrancis R.\binitsF. R. and \bauthor\bsnmJordan, \bfnmMichael I.\binitsM. I. (\byear2009). \btitleKernel dimension reduction in regression. \bjournalThe Annals of Statistics \bvolume37 \bpages1871 – 1905. \bdoi10.1214/08-AOS637 \endbibitem
- {binbook}[author] \bauthor\bsnmHermite, \bfnmCharles\binitsC. (\byear2009). \btitleSur un nouveau développement en série des fonctions. In \bbooktitleŒuvres de Charles Hermite. \bseriesCambridge Library Collection - Mathematics \bvolume2 \bpages293–308. \bpublisherCambridge University Press. \bdoi10.1017/CBO9780511702761.022 \endbibitem
- {barticle}[author] \bauthor\bsnmHristache, \bfnmMarian\binitsM., \bauthor\bsnmJuditsky, \bfnmAnatoli\binitsA. and \bauthor\bsnmSpokoiny, \bfnmVladimir\binitsV. (\byear2001). \btitleDirect estimation of the index coefficient in a single-index model. \bjournalAnn. Statist. \bvolume29 \bpages593-623. \endbibitem
- {barticle}[author] \bauthor\bsnmJenatton, \bfnmRodolphe\binitsR., \bauthor\bsnmAudibert, \bfnmJean-Yves\binitsJ.-Y. and \bauthor\bsnmBach, \bfnmFrancis\binitsF. (\byear2011). \btitleStructured variable selection with sparsity-inducing norms. \bjournalJournal of Machine Learning Research \bvolume12 \bpages2777–2824. \endbibitem
- {binproceedings}[author] \bauthor\bsnmJenatton, \bfnmRodolphe\binitsR., \bauthor\bsnmObozinski, \bfnmGuillaume\binitsG. and \bauthor\bsnmBach, \bfnmFrancis\binitsF. (\byear2010). \btitleStructured sparse principal component analysis. In \bbooktitleProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics \bpages366–373. \endbibitem
- {barticle}[author] \bauthor\bsnmLi, \bfnmKer-Chau\binitsK.-C. (\byear1991). \btitleSliced inverse regression for dimension reduction. \bjournalJournal of the American Statistical Association \bvolume86 \bpages316–327. \endbibitem
- {barticle}[author] \bauthor\bsnmLi, \bfnmKer-Chau\binitsK.-C. (\byear1992). \btitleOn principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. \bjournalJournal of the American Statistical Association \bvolume87 \bpages1025-1039. \bdoi10.1080/01621459.1992.10476258 \endbibitem
- {barticle}[author] \bauthor\bsnmRecht, \bfnmBenjamin\binitsB., \bauthor\bsnmFazel, \bfnmMaryam\binitsM. and \bauthor\bsnmParrilo, \bfnmPablo A.\binitsP. A. (\byear2010). \btitleGuaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. \bjournalSIAM Review \bvolume52 \bpages471–501. \bdoi10.1137/070697835 \endbibitem
- {barticle}[author] \bauthor\bsnmStoker, \bfnmThomas M.\binitsT. M. (\byear1986). \btitleConsistent estimation of scaled coefficient. \bjournalEconometrica \bpages1461–1481. \endbibitem
- {bbook}[author] \bauthor\bsnmSzegő, \bfnmG.\binitsG. (\byear1939). \btitleOrthogonal Polynomials. \bseriesAmerican Mathematical Society Colloquium Publications. \bpublisherAmerican Mathematical Society. \endbibitem
- {barticle}[author] \bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear1996). \btitleRegression shrinkage and selection via the Lasso. \bjournalJournal of the Royal Statistical Society. Series B (Methodological) \bvolume58 \bpages267–288. \endbibitem
- {bbook}[author] \bauthor\bsnmWahba, \bfnmGrace\binitsG. (\byear1990). \btitleSpline Models for Observational Data. \bpublisherSociety for Industrial and Applied Mathematics. \bdoi10.1137/1.9781611970128 \endbibitem
- {barticle}[author] \bauthor\bsnmXia, \bfnmYingcun\binitsY. (\byear2008). \btitleA multiple-index model and dimension reduction. \bjournalJournal of the American Statistical Association \bvolume103 \bpages1631–1640. \endbibitem
- {barticle}[author] \bauthor\bsnmYuan, \bfnmMing\binitsM. and \bauthor\bsnmLin, \bfnmYi\binitsY. (\byear2006). \btitleModel selection and estimation in regression with grouped variables. \bjournalJournal of the Royal Statistical Society. Series B (Statistical Methodology) \bvolume68 \bpages49–67. \endbibitem
- {barticle}[author] \bauthor\bsnmZhang, \bfnmCun-Hui\binitsC.-H. (\byear2010). \btitleNearly unbiased variable selection under minimax concave penalty. \bjournalThe Annals of Statistics \bvolume38 \bpages894–942. \endbibitem
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.