Dimensionality Reduction and Wasserstein Stability for Kernel Regression (2203.09347v3)
Abstract: In a high-dimensional regression framework, we study consequences of the naive two-step procedure where first the dimension of the input variables is reduced and second, the reduced input variables are used to predict the output variable with kernel regression. In order to analyze the resulting regression errors, a novel stability result for kernel regression with respect to the Wasserstein distance is derived. This allows us to bound errors that occur when perturbed input data is used to fit the regression function. We apply the general stability result to principal component analysis (PCA). Exploiting known estimates from the literature on both principal component analysis and kernel regression, we deduce convergence rates for the two-step procedure. The latter turns out to be particularly useful in a semi-supervised setting.
- Principal component and multiple regression analysis in modelling of ground-level ozone and factors affecting its concentrations. Environmental Modelling & Software, 20(10):1263–1271, 2005.
- Sufficient dimension reduction and prediction in regression. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906):4385–4405, 2009.
- Sparse single-index model. Journal of Machine Learning Research, 14(1), 2013.
- Peter Andras. High-dimensional function approximation with neural networks for large volumes of data. IEEE transactions on neural networks and learning systems, 29(2):500–508, 2017.
- Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 245–250, 2001.
- Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
- Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics, 38(4):2118–2144, 2010.
- Out-of-distribution generalization in kernel regression. arXiv preprint arXiv:2106.02261, 2021.
- Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
- Raffaele Chiappinelli. Nonlinear stability of eigenvalues of compact self-adjoint operators. Recent Trends in Nonlinear Analysis: Festschrift Dedicated to Alfonso Vignoli on the Occasion of His Sixtieth Birthday, pages 93–103, 2000.
- Interpretable dimension reduction. Journal of applied statistics, 32(9):969–987, 2005.
- Consistency and robustness of kernel-based regression in convex risk minimization. Bernoulli, 13(3):799–819, 2007.
- Total stability of kernel methods. Neurocomputing, 289:101–118, 2018.
- Multidimensional scaling. In Handbook of data visualization, pages 315–347. Springer, 2008.
- On the mathematical foundations of learning. American Mathematical Society, 39(1):1–49, 2002.
- Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime. arXiv preprint arXiv:2105.15004, 2021.
- Risk bounds for regularized least-squares algorithm with operator-value kernels. Technical report, 2005.
- Approximation of functions of few variables in high dimensions. Constructive Approximation, 33(1):125–143, 2011.
- Sobolev norm learning rates for regularized least-squares algorithms. The Journal of Machine Learning Research, 21(1):8464–8501, 2020.
- Experiments with random projections for machine learning. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 517–522, 2003.
- William Fulton. Eigenvalues, invariant factors, highest weights, and schubert calculus. Bulletin of the American Mathematical Society, 37(3):209–249, 2000.
- Optimal rates and adaptation in the single-index model using aggregation. Electronic journal of statistics, 1:538–573, 2007.
- Chong Gu. Smoothing spline ANOVA models, volume 297. Springer Science & Business Media, 2013.
- M Guillemard and A Iske. Interactions between kernels, frames, and persistent homology. In Recent Applications of Harmonic Analysis to Function Spaces, Differential Equations, and Data Science. Volume 2: Novel Methods in Harmonic Analysis, pages 861–888. Birkhäuser, 2017.
- Adaptive learning rates for support vector machines working on data with low intrinsic dimension. The Annals of Statistics, 49(6):3153–3180, 2021.
- Intrinsic dimension adaptive partitioning for kernel methods. SIAM Journal on Mathematics of Data Science, 4(2):721–749, 2022.
- Modeling of temperature–frequency correlation using combined principal component analysis and support vector regression technique. Journal of Computing in Civil Engineering, 21(2):122–135, 2007.
- Armin Iske. Approximation Theory and Algorithms for Data Analysis, volume 68 of Texts in Applied Mathematics. Springer, 2018.
- Ian T Jolliffe. Principal component analysis. Springer, New York, second edition, 2002.
- Dimension reduction in text classification with support vector machines. Journal of machine learning research, 6(1), 2005.
- Nonlinear dimensionality reduction, volume 1. Springer, 2007.
- Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–327, 1991.
- Optimal rates for spectral algorithms with least-squares regression over hilbert spaces. Applied and Computational Harmonic Analysis, 48(3):868–890, 2020.
- Yi Lin and Hao Helen Zhang. Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5):2272–2297, 2006.
- Umap: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29):861, 2018.
- Alexander Meister. Deconvolution problems in nonparametric statistics, volume 193 of Lecture Notes in Statistics. Springer-Verlag, Berlin, 2009.
- Regularization in kernel learning. The Annals of Statistics, 38(1):526–565, 2010.
- Andrew Ng. Sparse autoencoder. CS294A Lecture notes, 72(2011):1–19, 2011.
- Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Journal of Machine Learning Research, 13(2), 2012.
- Nonasymptotic upper bounds for the reconstruction error of PCA. Annals of Statistics, 48(2):1098–1123, 2020.
- Kernel principal component analysis. In International conference on artificial neural networks, pages 583–588. Springer, 1997.
- Support vector machines. Information Science and Statistics. Springer, New York, 2008. ISBN 978-0-387-77241-7.
- Mercer’s theorem on general domains: On the interaction between measures, kernels, and RKHSs. Constructive Approximation, 35(3):363–417, 2012.
- Optimal rates for regularized least squares regression. In COLT, pages 79–93, 2009.
- Wasserstein auto-encoders. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=HkL7n1-0b.
- Cédric Villani et al. Optimal transport: old and new, volume 338. Springer, 2009.
- Auto-encoder based dimensionality reduction. Neurocomputing, 184:232–242, 2016.
- Holger Wendland. Scattered data approximation, volume 17. Cambridge university press, 2004.
- Helmut Wielandt. An extremum property of sums of eigenvalues. Proceedings of the American Mathematical Society, 6(1):106–110, 1955.
- Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37–52, 1987.
- Learning rates of least-square regularized regression. Foundations of computational mathematics, 6(2):171–192, 2006.
- Zongmin Wu. Compactly supported positive definite radial functions. Advances in computational mathematics, 4:283–292, 1995.
- Robustness and regularization of support vector machines. Journal of machine learning research, 10(7), 2009.
- A useful variant of the Davis–Kahan theorem for statisticians. Biometrika, 102(2):315–323, 2015.
- Shengxin Zhu. Compactly supported radial basis functions: how and why? OCCAM Preprint Number 12/57, 2012.
- Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3(1):1–130, 2009.
- Xiaojin Jerry Zhu. Semi-supervised learning literature survey. Technical Report, 2005.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.