Empirical process theory for i.i.d. observations has emerged as a ubiquitous tool for understanding the generalization properties of various statistical problems. However, in many applications where the data exhibit temporal dependencies (e.g., in finance, medical imaging, weather forecasting etc.), the corresponding empirical processes are much less understood. Motivated by this observation, we present a general bound on the expected supremum of empirical processes under standard $\beta/\rho$-mixing assumptions. Unlike most prior work, our results cover both the long and the short-range regimes of dependence. Our main result shows that a non-trivial trade-off between the complexity of the underlying function class and the dependence among the observations characterizes the learning rate in a large class of nonparametric problems. This trade-off reveals a new phenomenon, namely that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting, provided the underlying function class is complex enough. We demonstrate the practical implications of our findings by analyzing various statistical estimators in both fixed and growing dimensions. Our main examples include a comprehensive case study of generalization error bounds in nonparametric regression over smoothness classes in fixed as well as growing dimension using neural nets, shape-restricted multivariate convex regression, estimating the optimal transport (Wasserstein) distance between two probability distributions, and classification under the Mammen-Tsybakov margin condition -- all under appropriate mixing assumptions. In the process, we also develop bounds on $L_r$ ($1\le r\le 2$)-localized empirical processes with dependent observations, which we then leverage to get faster rates for (a) tuning-free adaptation, and (b) set-structured learning problems.
We're not able to analyze this paper right now due to high demand.
Please check back later (sorry!).
Sign up for a free account or log in to generate a summary of this paper:
We ran into a problem analyzing this paper.
Fast learning rates for plug-in classifiers. The Annals of Statistics 35 608 – 633. https://doi.org/10.1214/009053606000001217
Bass, R. F. (1985). Law of the iterated logarithm for set-indexed partial sum processes with finite variance. Z. Wahrsch. Verw. Gebiete 70 591–608. https://doi.org/10.1007/BF00531869
Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113–150. https://doi.org/10.1007/BF01199316
Bradley, R. C. (1989). A stationary, pairwise independent, absolutely regular sequence for which the central limit theorem fails. Probab. Theory Related Fields 81 1–10. https://doi.org/10.1007/BF00343735
Information regularity and the central limit question. Rocky Mountain J. Math. 13 77–97. https://doi.org/10.1216/RMJ-1983-13-1-77
On matrix estimation under monotonicity constraints. Bernoulli 24 1072 – 1100. https://doi.org/10.3150/16-BEJ865
Empirical process techniques for dependent data. Birkhäuser Boston, Inc., Boston, MA. https://doi.org/10.1007/978-1-4612-0099-4
Doukhan, P. (2012). Mixing: Properties and Examples. Lecture Notes in Statistics, Springer New York. https://books.google.co.in/books?id=KFXmBwAAQBAJ
Regularized least-squares regression: learning from a β𝛽\betaitalic_β-mixing sequence. J. Statist. Plann. Inference 142 493–505. https://doi.org/10.1016/j.jspi.2011.08.007
On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Related Fields 162 707–738. https://doi.org/10.1007/s00440-014-0583-7
Concentration inequalities and asymptotic results for ratio type empirical processes. The Annals of Probability 34 1143 – 1216. https://doi.org/10.1214/009117906000000070
Goldstein, S. (1979). Maximal coupling. Z. Wahrsch. Verw. Gebiete 46 193–204. https://doi.org/10.1007/BF00533259
Guégan, D. (2005). How can we define the concept of long memory? An econometric survey. Econometric Rev. 24 113–149. https://doi.org/10.1081/ETC-200067887
Isotonic regression in general dimensions. The Annals of Statistics 47 2440 – 2471. https://doi.org/10.1214/18-AOS1753
Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics 34 2593 – 2656. https://doi.org/10.1214/009053606000001019
Generalization bounds for time series prediction with non-stationary processes. In Algorithmic learning theory, vol. 8776 of Lecture Notes in Comput. Sci. Springer, Cham, 260–274. https://doi.org/10.1007/978-3-319-11662-4_19
Theoretical analysis of deep neural networks for temporally dependent observations. In Advances in Neural Information Processing Systems (A. H. Oh, A. Agarwal, D. Belgrave and K. Cho, eds.). https://openreview.net/forum?id=wN1CBFFx7JF
Risk bounds for statistical learning. The Annals of Statistics 34 2326 – 2366. https://doi.org/10.1214/009053606000000786
McCann, R. J. (1995). Existence and uniqueness of monotone measure-preserving maps. Duke Mathematical Journal 80 309 – 323. https://doi.org/10.1215/S0012-7094-95-08013-2
Ossiander, M. (1987). A central limit theorem under metric entropy with L2subscript𝐿2L{2}italicL startPOSTSUBSCRIPT 2 endPOSTSUBSCRIPT bracketing. Ann. Probab. 15 897–919. http://links.jstor.org/sici?sici=0091-1798(198707)15:3897:ACLTUM2.0.CO;2-Z&origin=MSN
Long range dependence in heavy tailed stochastic processes. In Handbook of Heavy Tailed Distributions in Finance (S. T. Rachev, ed.), vol. 1 of Handbooks in Finance. North-Holland, Amsterdam, 641–662. https://www.sciencedirect.com/science/article/pii/B9780444508966500182
Roussas, G. G. (1990). Nonparametric regression estimation under mixing conditions. Stochastic Process. Appl. 36 107–116. https://doi.org/10.1016/0304-4149(90)90045-T
Samorodnitsky, G. (2016). Stochastic processes and long range dependence. Springer Series in Operations Research and Financial Engineering, Springer, Cham. https://doi.org/10.1007/978-3-319-45575-4
Nonparametric least squares estimation of a multivariate convex regression function. The Annals of Statistics 39 1633 – 1657. https://doi.org/10.1214/10-AOS852
Vidyasagar, M. (2003). Learning and generalization. 2nd ed. Communications and Control Engineering Series, Springer-Verlag London, Ltd., London. With applications to neural networks. https://doi.org/10.1007/978-1-4471-3748-1
Villani, C. (2009). Optimal transport, vol. 338 of Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin. Old and new. https://doi.org/10.1007/978-3-540-71050-9
Faithful variable screening for high-dimensional convex regression. The Annals of Statistics 44 2624 – 2660. https://doi.org/10.1214/15-AOS1425