Papers
Topics
Authors
Recent
Search
2000 character limit reached

Projected Empirical Processes

Updated 6 January 2026
  • Projected empirical processes are a statistical method that uses one-dimensional projections to reduce high-dimensional data and facilitate effective regression model testing.
  • They construct empirical processes indexed by projected covariates, achieving computational scalability and weak convergence to Gaussian processes under regular conditions.
  • Multiple projections and p-value combination methods, such as the Cauchy statistic, enhance test power while mitigating the curse of dimensionality in complex regression settings.

A projected empirical process is a statistical methodology that extends classical empirical process theory to high- or infinite-dimensional covariate spaces by leveraging random or data-driven one-dimensional projections. These techniques construct goodness-of-fit tests for regression models—such as the functional linear model (FLM) or sparse high-dimensional regressions—by analyzing empirical processes indexed by projected covariates. Projected empirical processes address the curse of dimensionality, improve computational scalability, and provide theoretically sound testing procedures even in settings where the dimension of the covariate space competes with or exceeds the sample size.

1. Definition and Construction of Projected Empirical Processes

Let {(Xi,Yi)}i=1n\{(X_i,Y_i)\}_{i=1}^n be independent observations, where XiX_i lies in a high- or infinite-dimensional space (such as a separable Hilbert space H\mathcal{H} or Rp\mathbb{R}^p for large pp), and YiY_i is a scalar response. In the functional linear model,

Y=X,ρ+ε,Y = \langle X, \rho \rangle + \varepsilon,

with XX in H\mathcal{H}, a projected empirical process is constructed by first drawing a projection direction hHh \in \mathcal{H} (often randomly from a Gaussian measure when H\mathcal{H} is infinite-dimensional), and computing the scalar projections Xih=Xi,hX_i^h = \langle X_i, h \rangle.

The marked empirical process indexed by xRx \in \mathbb{R} is then given by

Tn,h(x)=ani=1n1{Xihx}[Yim^(Xi)],T_{n,h}(x) = a_n \sum_{i=1}^n 1_{\{X_i^h \leq x\}} [Y_i - \hat{m}(X_i)],

where m^()\hat{m}(\cdot) is a fitted or candidate regression function, and ana_n is typically n1/2n^{-1/2}, ensuring the process is normalized for weak convergence analysis (Cuesta-Albertos et al., 2017).

In ultra-high-dimensional parametric regressions, a similar process is defined for unit uSp1u \in S^{p-1} as

Gn(u,t)=1ni=1nr^i1{uXit},G_n(u,t) = \frac{1}{\sqrt n} \sum_{i=1}^n \hat r_i\, 1_{\{u^\top X_i \le t\}},

with r^i=Yim(Xi,β^)\hat r_i = Y_i - m(X_i,\hat\beta) for a fitted parameter β^\hat\beta (Tan et al., 2 Jan 2026).

2. Theoretical Foundations and Weak Convergence

The central theoretical advance is the reduction from high-dimensional covariates to scalar projections, allowing the application of classical empirical process tools. Under regularity assumptions—moment restrictions, estimator consistency, and suitable regularization—the projected empirical processes, under the null hypothesis, converge weakly to Gaussian processes conditionally on the projection direction.

In the FLM with estimation error controlled by Cardot–Mas–Sarda (CMS) regularization, weak convergence holds for Tn,h()T_{n,h}(\cdot) in D(R)D(\mathbb{R}) to a mean-zero Gaussian process G2()\mathcal{G}_2(\cdot) with explicit covariance structure (Cuesta-Albertos et al., 2017). Similarly, for finite-dimensional sparse regressions (possibly with pnp \gg n), a martingale transformation can be applied to Gn(u,t)G_n(u,t) to remove the impact of non-asymptotic linearity in β^\hat\beta, yielding a process converging in law to standard Brownian motion after proper time change (Tan et al., 2 Jan 2026).

Almost sure equivalence theorems ensure that testing the projected process for a randomly drawn hh is almost surely equivalent to testing the full functional covariate (Cuesta-Albertos et al., 2017):

Statement Reference Details
E[YX]=0 a.s.E[Y|X]=0 \ \mathrm{a.s.} iff E[YX,β]=0E[Y|\langle X, \beta \rangle]=0 for all β\beta (Cuesta-Albertos et al., 2017) "No randomness" proposition
E[YX]=0E[Y|X]=0 iff E[YXh]=0E[Y|X^h]=0 for hh drawn from a nondegenerate Gaussian measure (almost surely) (Cuesta-Albertos et al., 2017) "Random projection" theorem

3. Test Statistics, Martingale Transform, and Distribution-Free Inference

Classical continuous functionals such as Kolmogorov–Smirnov (KS) and Cramér–von Mises (CvM) statistics are employed on the projected processes:

  • KS: Tn,hKS:=supxRTn,h(x)\|T_{n,h}\|_{KS} := \sup_{x\in\mathbb{R}} |T_{n,h}(x)|
  • CvM: Tn,hCvM:=Tn,h(x)2dFn,h(x)\|T_{n,h}\|_{CvM} := \int T_{n,h}(x)^2 \,dF_{n,h}(x) (Cuesta-Albertos et al., 2017)

In ultra-high-dimensional regression (pnp \gg n), nuisance parameter shifts in Gn(u,t)G_n(u,t) are eliminated via a martingale transform TT defined in terms of Radon–Nikodym derivatives of conditional expectations and variances. The resulting martingale-transformed process G~n(u,t)=TGn(u,t)\tilde{G}_n(u,t) = T G_n(u,t) is asymptotically distribution-free, and the CvM-type statistic Wn(u)W_n(u) converges to 01B2(z)dz\int_0^1 B^2(z) dz (where BB is standard Brownian motion), independently of pp and regression parameters (Tan et al., 2 Jan 2026).

These advances enable practical tests with explicit limiting null distributions, circumventing parameter estimation challenges that preclude asymptotic normality or linearization in high dimensions.

4. Multiple Projections and p-Value Combination

Because single projections can lose power against alternatives that are nearly orthogonal to the chosen direction, multiple independent projections are adopted. In functional settings, several h1,,hKh_1, \dots, h_K are drawn (often aligned with leading principal components), with test statistics and bootstrapped pp-values computed for each. These are aggregated using procedures controlling false discovery rate (FDR), such as the Benjamini–Hochberg rule (Cuesta-Albertos et al., 2017).

In ultra-high dimensions, pp-values {p^k}\{\hat p_k\} from KK projections are combined using the Cauchy combination statistic:

TC=1Kk=1Ktan{(12p^k)π},T_C = \frac{1}{K} \sum_{k=1}^K \tan\left\{ \left(\tfrac12 - \hat p_k\right) \pi \right\},

with null distributions determined asymptotically by the standard Cauchy law, enabling valid inference even under dependence among projections (Tan et al., 2 Jan 2026).

Moreover, to address frequency sensitivity (low-frequency detection by empirical processes, high-frequency detection by local smoothing), hybrid tests are constructed by combining both types of pp-values through the same Cauchy mechanism. This joint approach controls Type I error and improves detection across a spectrum of alternatives (Tan et al., 2 Jan 2026).

5. Implementation Strategies and Calibration

Practical implementation requires consistent estimation of projection directions and residuals, as well as calibration of critical values. In functional regression, wild bootstrap procedures—employing Rademacher multipliers to resample residuals—are used to estimate the null distribution of the projected test statistics (Cuesta-Albertos et al., 2017). The steps involve:

  1. Fit the FLM under the null with regularization.
  2. Compute test statistic from the observed data.
  3. Perform bootstrap resampling of residuals, recompute the statistic, and estimate the empirical pp-value.
  4. For multiple projections, combine pp-values using FDR controls.

In ultra-high dimensions, the distribution-free theory eliminates the need for resampling, as the null distribution of Wn(u)W_n(u) is universal, calibrated via the law of 01B2(z)dz\int_0^1 B^2(z)\,dz (Tan et al., 2 Jan 2026). This enables computational efficiency and scalability to very large pp and KK.

6. Power Properties and Finite-Sample Performance

Simulation studies in FLM settings demonstrate that projected CvM tests consistently achieve high power, outperforming KS-based approaches and maintaining Type I error for modest numbers of projections (K=1K = 1 to $5$). Increasing KK further leads to over-conservative behavior due to discrete pp-values under FDR correction, with K3K \approx 3 offering a robust compromise (Cuesta-Albertos et al., 2017). Compared to previously proposed tests averaging over fixed, non-random projections, randomly projected procedures offer a significant reduction in computational complexity (O(nn) vs. O(n3n^3)), with only a modest sacrifice in power on small samples.

In high-dimensional regression, the martingale-transformed, projected empirical process tests maintain correct size and detect a wide class of alternatives provided that at least one projection is sensitive to the signal. The hybridization with local-smoothing statistics broadens power across oscillatory alternatives (Tan et al., 2 Jan 2026).

7. Broader Impact, Limitations, and Comparative Perspective

Projected empirical processes constitute a robust and scalable paradigm for hypothesis testing in both functional data analysis and high-dimensional statistics. They mitigate the curse of dimensionality by dimension reduction through projections, leverage theoretical properties such as almost sure equivalence of projected and full-covariate nulls, and facilitate computationally efficient inference.

However, some limitations remain. Power loss can occur for alternatives nearly orthogonal to all chosen projections, motivating aggregation across multiple projections. For FDR-type combination rules in the functional setting, large KK leads to over-conservative performance due to the discrete distribution of bootstrap-based pp-values (Cuesta-Albertos et al., 2017). In the high-dimensional context, the need for sufficiently accurate rate-bounded estimators (rather than asymptotic normality) is a requirement, though it is notably weaker than classical assumptions (Tan et al., 2 Jan 2026).

The theoretical and empirical advances summarized here delineate a comprehensive and flexible framework for goodness-of-fit testing in modern regression settings, with implications for a wide range of applications in statistics, econometrics, and machine learning.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Projected Empirical Processes.