Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control (2205.11956v4)

Published 24 May 2022 in stat.ML, cs.LG, and stat.ME

Abstract: Most machine learning methods require tuning of hyper-parameters. For kernel ridge regression with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length scale of the kernel and has to be carefully selected to obtain a model with good generalization. The default methods for bandwidth selection, cross-validation and marginal likelihood maximization, often yield good results, albeit at high computational costs. Inspired by Jacobian regularization, we formulate an approximate expression for how the derivatives of the functions inferred by kernel ridge regression with the Gaussian kernel depend on the kernel bandwidth. We use this expression to propose a closed-form, computationally feather-light, bandwidth selection heuristic, based on controlling the Jacobian. In addition, the Jacobian expression illuminates how the bandwidth selection is a trade-off between the smoothness of the inferred function and the conditioning of the training data kernel matrix. We show on real and synthetic data that compared to cross-validation and marginal likelihood maximization, our method is on pair in terms of model performance, but up to six orders of magnitude faster.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Complete ensemble empirical mode decomposition hybridized with random forest and kernel ridge regression model for monthly rainfall forecasts. Journal of Hydrology, 584:124647.
  2. Amini, A. A. (2021). Spectrally-truncated kernel ridge regression and its free lunch. Electronic Journal of Statistics, 15(2):3743–3761.
  3. Stabilizing equilibrium models by jacobian regularization. arXiv preprint arXiv:2106.14342.
  4. To understand deep learning we need to understand kernel learning. In International Conference on Machine Learning, pages 541–549. PMLR.
  5. Multiscale data sampling and function extension. Applied and Computational Harmonic Analysis, 34(1):15–29.
  6. Data driven prediction models of energy use of appliances in a low-energy house. Energy and Buildings, 140:81–97.
  7. Jacobian adversarially regularized networks for robustness. arXiv preprint arXiv:1912.10185.
  8. Optimizing etching process recipe based on kernel ridge regression. Journal of Manufacturing Processes, 61:454–460.
  9. Deep neural tangent kernel and laplace kernel have the same rkhs. arXiv preprint arXiv:2009.10683.
  10. Kernel ridge regression-based tv regularization for motion correction of dynamic mri. Signal Processing, 197:108559.
  11. Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American statistical association, 83(403):596–610.
  12. Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. Journal of the Royal Statistical Society: Series B (Methodological), 57(2):371–394.
  13. Well logging curve reconstruction based on kernel ridge regression. Arabian Journal of Geosciences, 14(16):1–10.
  14. How to train your neural ode: the world of jacobian and kinetic regularization. In International conference on machine learning, pages 3154–3164. PMLR.
  15. On the similarity between the laplace and neural tangent kernels. Advances in Neural Information Processing Systems, 33:1451–1461.
  16. When do neural networks outperform kernel methods? Advances in Neural Information Processing Systems, 33:14820–14830.
  17. Linearized two-layers neural networks in high dimension. The Annals of Statistics, 49(2):1029–1054.
  18. Surprises in high-dimensional ridgeless least squares interpolation. The Annals of Statistics, 50(2):949–986.
  19. Robust learning with jacobian regularization. arXiv preprint arXiv:1908.02729.
  20. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31.
  21. Improving dnn robustness to adversarial attacks using jacobian regularization. In Proceedings of the European Conference on Computer Vision (ECCV), pages 514–529.
  22. A review and comparison of bandwidth selection methods for kernel regression. International Statistical Review, 82(2):243–274.
  23. Krige, D. G. (1951). A statistical approach to some basic mine valuation problems on the witwatersrand. Journal of the Southern African Institute of Mining and Metallurgy, 52(6):119–139.
  24. Fingerprinting indoor positioning method based on kernel ridge regression with feature reduction. Wireless Communications and Mobile Computing, 2021.
  25. Matheron, G. (1963). Principles of geostatistics. Economic geology, 58(8):1246–1266.
  26. Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration. Applied and Computational Harmonic Analysis.
  27. Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & Its Applications, 9(1):141–142.
  28. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291–297.
  29. Comparison of data-driven bandwidth selectors. Journal of the American Statistical Association, 85(409):66–72.
  30. Double descent in the condition number. arXiv preprint arXiv:1912.06190.
  31. Kernel ridge regression model for sediment transport in open channel flow. Neural Computing and Applications, 33(17):11255–11271.
  32. Experimental evaluation and development of predictive models for rheological behavior of aqueous fe3o4 ferrofluid in the presence of an external magnetic field by introducing a novel grid optimization based-kernel ridge regression supported by sensitivity analysis. Powder Technology, 393:1–11.
  33. A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society: Series B (Methodological), 53(3):683–690.
  34. Silverman, B. W. (2018). Density estimation for statistics and data analysis. Routledge, .
  35. Neural dynamics underlying birdsong practice and performance. Nature, 599(7886):635–639.
  36. Vanwynsberghe, C. (2021). Kriging the french temperatures. https://towardsdatascience.com/kriging-the-french-temperatures-f0389ca908dd.
  37. Watson, G. S. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A, pages 359–372.
  38. Gaussian processes for machine learning. MIT press Cambridge, MA, .
  39. Generalized additive models for gigadata: Modeling the uk black smoke network daily data. Journal of the American Statistical Association, 112(519):1199–1210.
  40. Increasing efficiency of nonadiabatic molecular dynamics by hamiltonian interpolation with kernel ridge regression. The Journal of Physical Chemistry A, 125(41):9191–9200.
  41. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science, 363(6424):eaau5631.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com