Projected Empirical Processes

Updated 6 January 2026

Projected empirical processes are a statistical method that uses one-dimensional projections to reduce high-dimensional data and facilitate effective regression model testing.
They construct empirical processes indexed by projected covariates, achieving computational scalability and weak convergence to Gaussian processes under regular conditions.
Multiple projections and p-value combination methods, such as the Cauchy statistic, enhance test power while mitigating the curse of dimensionality in complex regression settings.

A projected empirical process is a statistical methodology that extends classical empirical process theory to high- or infinite-dimensional covariate spaces by leveraging random or data-driven one-dimensional projections. These techniques construct goodness-of-fit tests for regression models—such as the functional linear model (FLM) or sparse high-dimensional regressions—by analyzing empirical processes indexed by projected covariates. Projected empirical processes address the curse of dimensionality, improve computational scalability, and provide theoretically sound testing procedures even in settings where the dimension of the covariate space competes with or exceeds the sample size.

1. Definition and Construction of Projected Empirical Processes

Let $\{(X_i,Y_i)\}_{i=1}^n$ be independent observations, where $X_i$ lies in a high- or infinite-dimensional space (such as a separable Hilbert space $\mathcal{H}$ or $\mathbb{R}^p$ for large $p$ ), and $Y_i$ is a scalar response. In the functional linear model,

$Y = \langle X, \rho \rangle + \varepsilon,$

with $X$ in $\mathcal{H}$ , a projected empirical process is constructed by first drawing a projection direction $h \in \mathcal{H}$ (often randomly from a Gaussian measure when $\mathcal{H}$ is infinite-dimensional), and computing the scalar projections $X_i^h = \langle X_i, h \rangle$ .

The marked empirical process indexed by $x \in \mathbb{R}$ is then given by

$T_{n,h}(x) = a_n \sum_{i=1}^n 1_{\{X_i^h \leq x\}} [Y_i - \hat{m}(X_i)],$

where $\hat{m}(\cdot)$ is a fitted or candidate regression function, and $a_n$ is typically $n^{-1/2}$ , ensuring the process is normalized for weak convergence analysis (Cuesta-Albertos et al., 2017).

In ultra-high-dimensional parametric regressions, a similar process is defined for unit $u \in S^{p-1}$ as

$G_n(u,t) = \frac{1}{\sqrt n} \sum_{i=1}^n \hat r_i\, 1_{\{u^\top X_i \le t\}},$

with $\hat r_i = Y_i - m(X_i,\hat\beta)$ for a fitted parameter $\hat\beta$ (Tan et al., 2 Jan 2026).

2. Theoretical Foundations and Weak Convergence

The central theoretical advance is the reduction from high-dimensional covariates to scalar projections, allowing the application of classical empirical process tools. Under regularity assumptions—moment restrictions, estimator consistency, and suitable regularization—the projected empirical processes, under the null hypothesis, converge weakly to Gaussian processes conditionally on the projection direction.

In the FLM with estimation error controlled by Cardot–Mas–Sarda (CMS) regularization, weak convergence holds for $T_{n,h}(\cdot)$ in $D(\mathbb{R})$ to a mean-zero Gaussian process $\mathcal{G}_2(\cdot)$ with explicit covariance structure (Cuesta-Albertos et al., 2017). Similarly, for finite-dimensional sparse regressions (possibly with $p \gg n$ ), a martingale transformation can be applied to $G_n(u,t)$ to remove the impact of non-asymptotic linearity in $\hat\beta$ , yielding a process converging in law to standard Brownian motion after proper time change (Tan et al., 2 Jan 2026).

Almost sure equivalence theorems ensure that testing the projected process for a randomly drawn $h$ is almost surely equivalent to testing the full functional covariate (Cuesta-Albertos et al., 2017):

Statement	Reference	Details
$E[Y\|X]=0 \ \mathrm{a.s.}$ iff $E[Y\|\langle X, \beta \rangle]=0$ for all $\beta$	(Cuesta-Albertos et al., 2017)	"No randomness" proposition
$E[Y\|X]=0$ iff $E[Y\|X^h]=0$ for $h$ drawn from a nondegenerate Gaussian measure (almost surely)	(Cuesta-Albertos et al., 2017)	"Random projection" theorem

3. Test Statistics, Martingale Transform, and Distribution-Free Inference

Classical continuous functionals such as Kolmogorov–Smirnov (KS) and Cramér–von Mises (CvM) statistics are employed on the projected processes:

KS: $\|T_{n,h}\|_{KS} := \sup_{x\in\mathbb{R}} |T_{n,h}(x)|$
CvM: $\|T_{n,h}\|_{CvM} := \int T_{n,h}(x)^2 \,dF_{n,h}(x)$ (Cuesta-Albertos et al., 2017)

In ultra-high-dimensional regression ( $p \gg n$ ), nuisance parameter shifts in $G_n(u,t)$ are eliminated via a martingale transform $T$ defined in terms of Radon–Nikodym derivatives of conditional expectations and variances. The resulting martingale-transformed process $\tilde{G}_n(u,t) = T G_n(u,t)$ is asymptotically distribution-free, and the CvM-type statistic $W_n(u)$ converges to $\int_0^1 B^2(z) dz$ (where $B$ is standard Brownian motion), independently of $p$ and regression parameters (Tan et al., 2 Jan 2026).

These advances enable practical tests with explicit limiting null distributions, circumventing parameter estimation challenges that preclude asymptotic normality or linearization in high dimensions.

4. Multiple Projections and p-Value Combination

Because single projections can lose power against alternatives that are nearly orthogonal to the chosen direction, multiple independent projections are adopted. In functional settings, several $h_1, \dots, h_K$ are drawn (often aligned with leading principal components), with test statistics and bootstrapped $p$ -values computed for each. These are aggregated using procedures controlling false discovery rate (FDR), such as the Benjamini–Hochberg rule (Cuesta-Albertos et al., 2017).

In ultra-high dimensions, $p$ -values $\{\hat p_k\}$ from $K$ projections are combined using the Cauchy combination statistic:

$T_C = \frac{1}{K} \sum_{k=1}^K \tan\left\{ \left(\tfrac12 - \hat p_k\right) \pi \right\},$

with null distributions determined asymptotically by the standard Cauchy law, enabling valid inference even under dependence among projections (Tan et al., 2 Jan 2026).

Moreover, to address frequency sensitivity (low-frequency detection by empirical processes, high-frequency detection by local smoothing), hybrid tests are constructed by combining both types of $p$ -values through the same Cauchy mechanism. This joint approach controls Type I error and improves detection across a spectrum of alternatives (Tan et al., 2 Jan 2026).

5. Implementation Strategies and Calibration

Practical implementation requires consistent estimation of projection directions and residuals, as well as calibration of critical values. In functional regression, wild bootstrap procedures—employing Rademacher multipliers to resample residuals—are used to estimate the null distribution of the projected test statistics (Cuesta-Albertos et al., 2017). The steps involve:

Fit the FLM under the null with regularization.
Compute test statistic from the observed data.
Perform bootstrap resampling of residuals, recompute the statistic, and estimate the empirical $p$ -value.
For multiple projections, combine $p$ -values using FDR controls.

In ultra-high dimensions, the distribution-free theory eliminates the need for resampling, as the null distribution of $W_n(u)$ is universal, calibrated via the law of $\int_0^1 B^2(z)\,dz$ (Tan et al., 2 Jan 2026). This enables computational efficiency and scalability to very large $p$ and $K$ .

6. Power Properties and Finite-Sample Performance

Simulation studies in FLM settings demonstrate that projected CvM tests consistently achieve high power, outperforming KS-based approaches and maintaining Type I error for modest numbers of projections ( $K = 1$ to $5$). Increasing $K$ further leads to over-conservative behavior due to discrete $p$ -values under FDR correction, with $K \approx 3$ offering a robust compromise (Cuesta-Albertos et al., 2017). Compared to previously proposed tests averaging over fixed, non-random projections, randomly projected procedures offer a significant reduction in computational complexity (O( $n$ ) vs. O( $n^3$ )), with only a modest sacrifice in power on small samples.

In high-dimensional regression, the martingale-transformed, projected empirical process tests maintain correct size and detect a wide class of alternatives provided that at least one projection is sensitive to the signal. The hybridization with local-smoothing statistics broadens power across oscillatory alternatives (Tan et al., 2 Jan 2026).

7. Broader Impact, Limitations, and Comparative Perspective

Projected empirical processes constitute a robust and scalable paradigm for hypothesis testing in both functional data analysis and high-dimensional statistics. They mitigate the curse of dimensionality by dimension reduction through projections, leverage theoretical properties such as almost sure equivalence of projected and full-covariate nulls, and facilitate computationally efficient inference.

However, some limitations remain. Power loss can occur for alternatives nearly orthogonal to all chosen projections, motivating aggregation across multiple projections. For FDR-type combination rules in the functional setting, large $K$ leads to over-conservative performance due to the discrete distribution of bootstrap-based $p$ -values (Cuesta-Albertos et al., 2017). In the high-dimensional context, the need for sufficiently accurate rate-bounded estimators (rather than asymptotic normality) is a requirement, though it is notably weaker than classical assumptions (Tan et al., 2 Jan 2026).

The theoretical and empirical advances summarized here delineate a comprehensive and flexible framework for goodness-of-fit testing in modern regression settings, with implications for a wide range of applications in statistics, econometrics, and machine learning.

Markdown Report Issue Upgrade to Chat

References (2)

Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes (2017)

Asymptotic Distribution-Free Tests for Ultra-high Dimensional Parametric Regressions via Projected Empirical Processes and $p$-value Combination (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Projected Empirical Processes.