Randomized Empirical Processes and Confidence Bands via Virtual Resampling (1802.04380v1)

Published 12 Feb 2018 in stat.ME

Abstract: Let $X,X_1,X_2,\cdots$ be independent real valued random variables with a common distribution function $F$, and consider ${X_1,\cdots,X_N }$, possibly a big concrete data set, or an imaginary random sample of size $N\geq 1$ on $X$. In the latter case, or when a concrete data set in hand is too big to be entirely processed, then the sample distribution function $F_N$ and the the population distribution function $F$ are both to be estimated. This, in this paper, is achieved via viewing ${X_1,\cdots,X_N }$ as above, as a finite population of real valued random variables with $N$ labeled units, and sampling its indices ${1,\cdots,N }$ with replacement $m_N:= \sum_{i=1}^N w_{i}^{(N)}$ times so that for each $1\leq i \leq N$, $w_{i}^{(N)}$ is the count of number of times the index $i$ of $X_i$ is chosen in this virtual resampling process. This exposition extends the Doob-Donsker classical theory of weak convergence of empirical processes to that of the thus created randomly weighted empirical processes when $N, m_N \rightarrow \infty$ so that $m_N=o(N^2)$.