Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 102 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 25 tok/s
GPT-5 High 35 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 472 tok/s Pro
Kimi K2 196 tok/s Pro
2000 character limit reached

Nonparametric Linear Feature Learning in Regression Through Regularisation (2307.12754v4)

Published 24 Jul 2023 in stat.ME, cs.AI, cs.LG, math.ST, and stat.TH

Abstract: Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would greatly enhance prediction, computation, and interpretation. To address this challenge, we propose a novel method for joint linear feature learning and non-parametric function estimation, aimed at more effectively leveraging hidden features for learning. Our approach employs empirical risk minimisation, augmented with a penalty on function derivatives, ensuring versatility. Leveraging the orthogonality and rotation invariance properties of Hermite polynomials, we introduce our estimator, named RegFeaL. By using alternative minimisation, we iteratively rotate the data to improve alignment with leading directions. We establish that the expected risk of our method converges in high-probability to the minimal risk under minimal assumptions and with explicit rates. Additionally, we provide empirical results demonstrating the performance of RegFeaL in various experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. {barticle}[author] \bauthor\bsnmAntoniadis, \bfnmA.\binitsA., \bauthor\bsnmLambert-Lacroix, \bfnmS.\binitsS. and \bauthor\bsnmLeblanc, \bfnmF.\binitsF. (\byear2003). \btitleEffective dimension reduction methods for tumor classification using gene expression data. \bjournalBioinformatics \bvolume19 \bpages563-570. \bdoi10.1093/bioinformatics/btg062 \endbibitem
  2. {barticle}[author] \bauthor\bsnmAronszajn, \bfnmN.\binitsN. (\byear1950). \btitleTheory of reproducing kernels. \bjournalTransactions of the American Mathematical Society \bvolume68 \bpages337–404. \endbibitem
  3. {barticle}[author] \bauthor\bsnmBabichev, \bfnmDmitry\binitsD. and \bauthor\bsnmBach, \bfnmFrancis\binitsF. (\byear2018). \btitleSlice inverse regression with score functions. \bjournalElectronic Journal of Statistics \bvolume12 \bpages1507-1543. \bdoi10.1214/18-EJS1428 \endbibitem
  4. {bbook}[author] \bauthor\bsnmBach, \bfnmFrancis\binitsF. (\byear2023). \btitleLearning Theory from First Principles (draft). \bpublisherMIT Press. \endbibitem
  5. {barticle}[author] \bauthor\bsnmBartlett, \bfnmPeter L.\binitsP. L. and \bauthor\bsnmMendelson, \bfnmShahar\binitsS. (\byear2002). \btitleRademacher and Gaussian complexities: Risk bounds and structural results. \bjournalJournal of Machine Learning Research \bvolume3 \bpages463–482. \endbibitem
  6. {bbook}[author] \bauthor\bsnmBoucheron, \bfnmS.\binitsS., \bauthor\bsnmLugosi, \bfnmG.\binitsG. and \bauthor\bsnmMassart, \bfnmP.\binitsP. (\byear2013). \btitleConcentration Inequalities: A Nonasymptotic Theory of Independence. \bpublisherOUP Oxford. \endbibitem
  7. {barticle}[author] \bauthor\bsnmBrillinger, \bfnmDavid\binitsD. (\byear2012). \btitleA generalized linear model with “Gaussian” regressor variables. \bjournalA Festschrift for Erich L. Lehmann. \bdoi10.1007/978-1-4614-1344-8_34 \endbibitem
  8. {barticle}[author] \bauthor\bsnmDalalyan, \bfnmArnak S.\binitsA. S., \bauthor\bsnmJuditsky, \bfnmAnatoli B.\binitsA. B. and \bauthor\bsnmSpokoiny, \bfnmVladimir\binitsV. (\byear2008). \btitleA new algorithm for estimating the effective dimension-reduction subspace. \bjournalJournal of Machine Learning Research \bvolume9 \bpages1647-1678. \endbibitem
  9. {barticle}[author] \bauthor\bsnmFriedman, \bfnmJerome H.\binitsJ. H. (\byear1991). \btitleMultivariate adaptive regression splines. \bjournalThe Annals of Statistics \bvolume19 \bpages1–67. \endbibitem
  10. {barticle}[author] \bauthor\bsnmFukumizu, \bfnmKenji\binitsK., \bauthor\bsnmBach, \bfnmFrancis R.\binitsF. R. and \bauthor\bsnmJordan, \bfnmMichael I.\binitsM. I. (\byear2009). \btitleKernel dimension reduction in regression. \bjournalThe Annals of Statistics \bvolume37 \bpages1871 – 1905. \bdoi10.1214/08-AOS637 \endbibitem
  11. {binbook}[author] \bauthor\bsnmHermite, \bfnmCharles\binitsC. (\byear2009). \btitleSur un nouveau développement en série des fonctions. In \bbooktitleŒuvres de Charles Hermite. \bseriesCambridge Library Collection - Mathematics \bvolume2 \bpages293–308. \bpublisherCambridge University Press. \bdoi10.1017/CBO9780511702761.022 \endbibitem
  12. {barticle}[author] \bauthor\bsnmHristache, \bfnmMarian\binitsM., \bauthor\bsnmJuditsky, \bfnmAnatoli\binitsA. and \bauthor\bsnmSpokoiny, \bfnmVladimir\binitsV. (\byear2001). \btitleDirect estimation of the index coefficient in a single-index model. \bjournalAnn. Statist. \bvolume29 \bpages593-623. \endbibitem
  13. {barticle}[author] \bauthor\bsnmJenatton, \bfnmRodolphe\binitsR., \bauthor\bsnmAudibert, \bfnmJean-Yves\binitsJ.-Y. and \bauthor\bsnmBach, \bfnmFrancis\binitsF. (\byear2011). \btitleStructured variable selection with sparsity-inducing norms. \bjournalJournal of Machine Learning Research \bvolume12 \bpages2777–2824. \endbibitem
  14. {binproceedings}[author] \bauthor\bsnmJenatton, \bfnmRodolphe\binitsR., \bauthor\bsnmObozinski, \bfnmGuillaume\binitsG. and \bauthor\bsnmBach, \bfnmFrancis\binitsF. (\byear2010). \btitleStructured sparse principal component analysis. In \bbooktitleProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics \bpages366–373. \endbibitem
  15. {barticle}[author] \bauthor\bsnmLi, \bfnmKer-Chau\binitsK.-C. (\byear1991). \btitleSliced inverse regression for dimension reduction. \bjournalJournal of the American Statistical Association \bvolume86 \bpages316–327. \endbibitem
  16. {barticle}[author] \bauthor\bsnmLi, \bfnmKer-Chau\binitsK.-C. (\byear1992). \btitleOn principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. \bjournalJournal of the American Statistical Association \bvolume87 \bpages1025-1039. \bdoi10.1080/01621459.1992.10476258 \endbibitem
  17. {barticle}[author] \bauthor\bsnmRecht, \bfnmBenjamin\binitsB., \bauthor\bsnmFazel, \bfnmMaryam\binitsM. and \bauthor\bsnmParrilo, \bfnmPablo A.\binitsP. A. (\byear2010). \btitleGuaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization. \bjournalSIAM Review \bvolume52 \bpages471–501. \bdoi10.1137/070697835 \endbibitem
  18. {barticle}[author] \bauthor\bsnmStoker, \bfnmThomas M.\binitsT. M. (\byear1986). \btitleConsistent estimation of scaled coefficient. \bjournalEconometrica \bpages1461–1481. \endbibitem
  19. {bbook}[author] \bauthor\bsnmSzegő, \bfnmG.\binitsG. (\byear1939). \btitleOrthogonal Polynomials. \bseriesAmerican Mathematical Society Colloquium Publications. \bpublisherAmerican Mathematical Society. \endbibitem
  20. {barticle}[author] \bauthor\bsnmTibshirani, \bfnmRobert\binitsR. (\byear1996). \btitleRegression shrinkage and selection via the Lasso. \bjournalJournal of the Royal Statistical Society. Series B (Methodological) \bvolume58 \bpages267–288. \endbibitem
  21. {bbook}[author] \bauthor\bsnmWahba, \bfnmGrace\binitsG. (\byear1990). \btitleSpline Models for Observational Data. \bpublisherSociety for Industrial and Applied Mathematics. \bdoi10.1137/1.9781611970128 \endbibitem
  22. {barticle}[author] \bauthor\bsnmXia, \bfnmYingcun\binitsY. (\byear2008). \btitleA multiple-index model and dimension reduction. \bjournalJournal of the American Statistical Association \bvolume103 \bpages1631–1640. \endbibitem
  23. {barticle}[author] \bauthor\bsnmYuan, \bfnmMing\binitsM. and \bauthor\bsnmLin, \bfnmYi\binitsY. (\byear2006). \btitleModel selection and estimation in regression with grouped variables. \bjournalJournal of the Royal Statistical Society. Series B (Statistical Methodology) \bvolume68 \bpages49–67. \endbibitem
  24. {barticle}[author] \bauthor\bsnmZhang, \bfnmCun-Hui\binitsC.-H. (\byear2010). \btitleNearly unbiased variable selection under minimax concave penalty. \bjournalThe Annals of Statistics \bvolume38 \bpages894–942. \endbibitem
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com