Kernel-based function learning in dynamic and non stationary environments (2310.02767v1)
Abstract: One central theme in machine learning is function estimation from sparse and noisy data. An example is supervised learning where the elements of the training set are couples, each containing an input location and an output response. In the last decades, a substantial amount of work has been devoted to design estimators for the unknown function and to study their convergence to the optimal predictor, also characterizing the learning rate. These results typically rely on stationary assumptions where input locations are drawn from a probability distribution that does not change in time. In this work, we consider kernel-based ridge regression and derive convergence conditions under non stationary distributions, addressing also cases where stochastic adaption may happen infinitely often. This includes the important exploration-exploitation problems where e.g. a set of agents/robots has to monitor an environment to reconstruct a sensorial field and their movements rules are continuously updated on the basis of the acquired knowledge on the field and/or the surrounding environment.
- A. Aravkin, G. Bottegal, and G. Pillonetto, “Boosting as a kernel-based method,” Machine Learning, vol. 108, no. 11, pp. 1951 – 1974, 2019.
- On regularization algorithms in learning theory. Journal of Complexity, 23(1):52–72, 2007.
- B. Bell and G. Pillonetto, “Estimating parameters and stochastic functions of one variable using nonlinear measurement models,” Inverse problems, vol. 20, no. 3, pp. 627–646, 2004.
- G. Blanchard and N. Mücke. Optimal rates for regularization of statistical inverse learning problems. Found Comput Math, 18:971-1013, 2018.
- A. Caponnetto and E. De Vito. Optimal rates for the regularized least-squares algorithm. Found Comput Math, 7:331–368, 2007.
- A. Caponnetto and Y. Yao. Cross-validation based adapttion for regularization operators in learning theory. Analysis and Applications, 08(02):161–183.
- J. Choi and R. Horowitz, “Learning coverage control of mobile sensing agents in one-dimensional stochastic environments,” Automatic Control, IEEE Transactions on, vol. 55, no. 3, pp. 804–809, March 2010.
- J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” Automatica, vol. 20, no. 2, pp. 243–255, 2004.
- C. Cortes and V. Vapnik. Support-vector networks. Mach. Learn., 20(3):273–297, 1995.
- F. Cucker and S. Smale. On the mathematical foundations of learning. Bulletin of the American mathematical society, 39:1–49, 2001.
- F. Cucker and S. Smale. Best choices for regularization parameters in learning theory: On the bias-variance problem. Found Comput Math, 2:413–428, 2002.
- Support vector regression machines. In Advances in Neural Information Processing Systems, 1997.
- Regularization networks and support vector machines. Advances in Computational Mathematics, 13:1–50, 2000.
- Regularization theory and neural networks architectures. Neural Computation, 7(2):219–269, 1995.
- The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer, Canada, 2001.
- Spectral algorithms for supervised learning. Neural Computation, 20:1873–1897, 2008.
- S. Mendelson and J. Neeman. Regularization in kernel learning. The Annals of Statistics, 38(1):526 – 565, 2010.
- H.Q. Minh and P. Niyogi and Y. Yao, “Mercer’s Theorem, Feature Maps, and Smoothing,” in Learning Theory. COLT 2006. Lecture Notes in Computer Science, vol. 4005, Springer, 2006.
- Regularized System Identification. Springer, 2022.
- G. Pillonetto and L. Ljung. Full Bayesian identification of linear dynamic systems using stable kernels. Proceedings of the National Academy of Sciences USA (PNAS), 2023.
- M. Schwager, D. Rus, and J.J. Slotine, “Decentralized, adaptive coverage control for networked robots,” Int. J. Rob. Res., vol. 28, no. 3, pp. 357–375, Mar. 2009.
- S. Smale and D.X. Zhou, “Online learning with Markov sampling,” Analysis and Applications, vol. 07, no. 01, pp. 87–113, 2009.
- M. Todescato and A. Carron and R. Carli and G. Pillonetto and L. Schenato, “Multi-Robots Gaussian Estimation and Coverage Control: from Client-Server to Peer-to-Peer Architectures,” Automatica, vol. 80, pp. 284–294, 2017.
- B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. (Adaptive Computation and Machine Learning). MIT Press, 2001.
- S. Smale and D.X. Zhou. Learning theory estimates via integral operators and their approximations. Constructive Approximation, 26:153–172, 2007.
- V. Vapnik. The nature of Statistical Learning Theory. Springer, 1997.
- G. Wahba. Spline models for observational data. SIAM, Philadelphia, 1990.
- C. Wang and D.X. Zhou. Optimal learning rates for least squares regularized regression with unbounded sampling. Journal of Complexity, 27(1):55 – 67, 2011.
- On early stopping in gradient descent learning. Constr Approx, 26:289–315, 2007.
- D.X. Zhou, “The covering number in learning theory,” Journal of Complexity, vol. 18, no. 3, pp. 739–767, 2002.