Convergence Conditions of Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data (2404.03211v4)
Abstract: We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we study the mean square asymptotic stability of a class of random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences dependent on the homogeneous ones. Secondly, we introduce the concept of random Tikhonov regularization path, and show that if the regularization path is slowly time-varying in some sense, then the output of the algorithm is consistent with the regularization path in mean square. Furthermore, if the data streams also satisfy the RKHS persistence of excitation condition, i.e. there exists a fixed length of time period, such that the conditional expectation of the operators induced by the input data accumulated over every time period has a uniformly strictly positive compact lower bound in the sense of the operator order with respect to time, then the output of the algorithm is consistent with the unknown function in mean square. Finally, for the case with independent and non-identically distributed data streams, the algorithm achieves the mean square consistency provided the marginal probability measures induced by the input data are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.
- E. Parzen, “An approach to time series analysis,” Ann. Math. Statist., vol. 32, no. 4, pp. 951-989, 1961.
- S. Smale and F. Cucker, “On the mathematical foundations of learning,” Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1-49, 2001.
- S. Smale and F. Cucker, “Best choice for regularization parameters in learning theory: on the bias-variance problem”, Foundations of Computational Mathematics, vol. 2, pp. 413-418, 2002.
- S. Smale and D. -X. Zhou, “Learning theory estimates via integral operators and their approximations”, Constructive Approximation, vol. 26, no. 2, pp. 153-172, 2007.
- B. Bousselmi, J. -F. Dupuy and A. Karoui, “Reproducing kernels based schemes for nanparametric regression”, arXiv preprint: 2001. 11213, 2020.
- I. Steinwart, D. Hush, and C. Scovel, “Learning from dependent observations”, J. Multivariate Anal., vol. 100, no. 1, pp. 175-194, 2009.
- B. Yu, “Rates of convergence for empirical processes of stationary mixing sequences”, Ann. Probab., vol. 22, no. 1, pp. 94-116, 1994.
- R. Meir, “Nonparametric time series prediction through adaptive model selection”, Machine Learning, vol. 39, pp. 5-34, 2000.
- B. Zou, L. Q. Li, and Z. B. Xu, “The generalization performance of ERM algorithm with strongly mixing observations”, J. Mach. Learn. Res., vol. 75, no. 3, pp. 275-295, 2009.
- M. Mohri and A. Rostamizadeh, “Stability bounds for stationary ϕitalic-ϕ\phiitalic_ϕ-mixing and β𝛽\betaitalic_β-mixing processes”, J. Mach. Learn. Res., vol. 11, pp. 789-814, 2010.
- B. Zou, Y. Y. Tang, Z. B. Xu, L. Q. Li, J. Xu, and Y. Lu, “The generalization performance of regularized regression algorithms based on markov sampling”, IEEE Trans. Cybernetics, vol. 44, no. 9, pp. 1497-1507, 2014.
- Z. W. Pan and Q. W. Xiao, “Least-square regularized regression with non-iid sampling”, J. Statist. Plann. Inference, vol. 139, no. 10, pp. 3579-3587, 2009.
- M. J. Zhang and H. W. Sun, “Regression learning with non-identically and non-independently sampling”, International Journal of Wavelets, Multiresolution and Information Processing, vol. 15, no. 1, 2017.
- A. Sancetta, “Estimation in reproducing kernel Hilbert spaces with dependent data”, IEEE Trans. Information Theory, vol. 67, no. 3, pp. 1782-1795, 2020.
- I. Ziemann and S. Tu, “Learning with little mixing”, Advances in Neural Information Processing Systems, vol. 35, pp. 4626-4637, 2022.
- A. Agarwal and J. C. Duchi, “The generalization ability of online algorithms for dependent data”, IEEE Trans. Information Theory, vol. 59, no. 1, pp. 573-587, 2013.
- J. Xu, Y. Y. Tang, B. Zou, Z. Xu, L. Li, and Y. Lu, “The generalization ability of online SVM classification based on Markov sampling”, IEEE Trans. Neural Networks and Learning Systems, vol. 26, no. 3, pp. 628-639, 2014.
- V. Kuznetsov and M. Mohri, “Time series prediction and online learning”, Proc. 29th Annual Conference on Learning Theory, PMLR, vol. 49, pp. 1190-1213, 2016.
- A. Godichon-Baggioni, N. Werge, and O. Wintenberger, “Learning from time-dependent streaming data with online stochastic algorithms”, arXiv preprint: 2205.12549, 2022.
- S. Smale and Y. Yao, “Online learning algorithms”, Found. Comput. Math., vol. 6, no. 2, pp. 145-170, 2006.
- Y. Yao, “A dynamic theory of learning”, Ph.D dissertation, Dept. Math., Univ. Calfornia, Berkeley, CA, USA, 2006.
- Y. Ying and M. Pontil, “Online gradient descent learning algorithms”, Found. Comput. Math., vol. 5, no. 5, pp. 561-596, 2008.
- P. Tarre``e\grave{\text{e}}over` start_ARG e end_ARGs and Y. Yao, “Online learning as stochastic approximation of regularization paths”, IEEE Trans. Information Theory, vol. 60, no. 99, pp. 5716-5735, 2014.
- A. Dieuleveut and F. Bach, “Nonparametric stochastic approximation with large step-sezes”, The Annals of Statistics, vol. 44, no. 4, pp. 1363-1399, 2016.
- Y. Ying and D. -X. Zhou, “Unregularized online learning algorithms with general loss functions”, Applied and Computational Harmonic Analysis, vol. 42, no. 2, pp. 224-244, 2017.
- Z. C. Guo and L. Shi, “Fast and strong convergence of online learning algorithms”, Advances in Computational Mathematics, vol. 45, pp. 2745-2770, 2019.
- J. Lin and V. Cevher, “Optimal convergence for distributed learning with stochastic gradient methods and spectral algorithms”, Journal of Machine Learning Research, vol. 21, no. 147, pp. 1-63, 2020.
- X. Guo, Z. C. Guo, and L. Shi, “Capacity dependent analysis for functional online learning algorithms”, Applied and Computational Harmonic Analysis, vol. 67, 101567, 2023.
- Z. C. Guo, A. Christmann, and L. Shi, “Optimality of Robust Online Learning”, arXiv preprint: 2304.10060, 2023.
- S. Smale and D. -X. Zhou, “Online learning with Markov sampling”, Analysis and Applications, vol. 7, no. 1, pp. 87-113, 2009.
- T. Hu and D. -X. Zhou, “Online learning with samples drawn from non-identical distributions”, Journal of Machine Learning Research, vol. 10, no. 12, pp. 2873-2898, 2009.
- T. Li and X. Zhang, “Random inverse problems over graphs: decentralized online learning”, arXiv preprint: 2303.11789, 2023.
- M. Green and J. B. Moore, “Persistency of excitation in linear systems”, Syst. Control Lett., vol 7. no. 5, pp. 351-360, 1986.
- L. Guo, “Estimating time-varying parameters by Kalman filter based algorithm: Stability and convergence”, IEEE Trans. Automatic Control, vol. 35, no. 2, pp. 141-147, 1990.
- J. F. Zhang, L. Guo, and H. F. Chen, “Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-stability of estimation errors of Kalman filter for tracking time-varying parameters”, Int. J. Adaptive Control and Signal Processing, vol. 5, pp. 155-174, 1991.
- L. Guo, “Stability of recursive stochastic tracking algorithms”, SIAM J. Control Optim., vol. 32, no. 5, pp. 1195-1225, 1994.
- L. Guo and L. Ljung, “Performance analysis of general tracking algorithms”, IEEE Trans. Automatic Control, vol. 40, no. 8, pp. 1388-1402, 1995.
- L. Guo and L. Ljung, and G. J. Wang, “Necessary and sufficient conditions for stability of LMS”, IEEE. Trans. Automatic Control, vol. 42, no. 6, pp. 761-770, 1997.
- S. Rosset and J. Zhu,“Piecewise linear regularized solution paths”, The Annals of Statistics, vol. 35, no. 3, pp. 1012-1030, 2007.
- H. W. Engl, M. Hanke, and A. Neubauer,“Regularization of inverse problems”, in Mathematics and its Applications. Boston, MA, USA: Kluwer, 1996.
- X. Zhang and T. Li, “Online learning in reproducing kernel Hilbert space with non-iid data”, Proc. 62nd IEEE Conference on Decision and Control (CDC), Singapore, 13-15, December, pp. 6610-6615, 2023.
- D. X. Zhou, “Capacity of reproducing kernel spaces in learning theory”, IEEE Trans. Information Theory, vol. 49, no.7, pp. 1743-1752, 2003.