Bootstrap Distribution-Free CUSUM
- The paper introduces bootstrap-based CUSUM methods that calibrate critical values through resampling, ensuring robust change-point detection without relying on strict distributional assumptions.
- It employs various bootstrap techniques—including naive, multiplier, and wild bootstrap—to handle univariate, high-dimensional, and functional data regimes effectively.
- Empirical results demonstrate that these methods maintain nominal error rates and detection power, making them well-suited for industrial process control and econometric applications.
Bootstrap-based distribution-free Cumulative Sum (CUSUM) methods constitute a rigorous class of statistical change-point detection and control tools that eschew strong parametric assumptions on data distributions. Through the use of resampling algorithms—typically variants of the bootstrap—they achieve critical value calibration and size control for CUSUM-type statistics in broad, nonparametric, and even high-dimensional settings.
1. Foundations and Scope
CUSUM statistics quantify deviations in partial sums or cumulative averages, making them well-suited for real-time change-point detection in sequential data. Classical CUSUM procedures rely on precise knowledge of underlying distributions (often Gaussian). Bootstrap-based, distribution-free CUSUM approaches address critical limitations of these classical methods—specifically, robustness to distributional misspecification, handling of unknown or heteroskedastic variances, and adaptation to high-dimensional or functional data regimes (0906.1421, Yu et al., 2017, Paparoditis et al., 2024).
Main Features:
- Distribution-freeness: No reliance on parametric forms for pre- or post-change distributions.
- Bootstrap critical value calibration: Data-driven resampling aligns test statistic distributions to observed data properties.
- Applicability: Covers univariate, multivariate, high-dimensional, panel, regression, and functional data, under minimal assumptions.
2. General Methodological Principles
The commonality across bootstrap-based, distribution-free CUSUM methods is the generation of reference distributions for CUSUM statistics via resampling, rather than parametric theory. This ensures nominal control of error rates (e.g., type I error, ARL) even in model-uncertain or unknown-variance regimes.
Key Points:
- CUSUM statistic is tailored to the change-point setting (e.g., partial sums, panel differences, residual marked processes).
- Bootstrap samples are used to empirically estimate critical values under null-hypotheses, reflecting actual data dependence and variability.
- Procedures can leverage naive bootstrap (resampling data), multiplier bootstrap (resampling with random weights), wild bootstrap (for regression or heteroskedasticity), or strategy-specific resampling (conditional on process states).
3. Algorithmic Details in Selected Regimes
3.1 Univariate SPC: Conditional-Bootstrap CUSUM Control Limits
For univariate sequential process control, the bootstrap CUSUM of (0906.1421) replaces universal control limits with a sequence , each tailored (via bootstrap) to the conditional distribution of the CUSUM statistic after a "sprint" of steps since last reset. The algorithm involves:
- Phase I estimation: Construct a smooth empirical estimate from in-control data.
- Reference value tuning: Calibrate the allowance parameter to target the average sprint length.
- Bootstrap control limit estimation: For each , generate bootstrap samples for the process conditional on a "run" of length , and estimate as the appropriate quantile.
- Calibration: Simulate to ensure the in-control ARL matches a prescribed target.
- Detection: In monitoring, signal a change if , where is the sprint length since last reset (0906.1421).
The resulting control chart is robust to non-normal underlying distributions and shows empirical ARL matching nominal settings across a wide range of alternatives.
3.2 High-dimensional Mean Shift: Gaussian Multiplier Bootstrap CUSUM
In the setting of high-dimensional vectors, (Yu et al., 2017) deploys a CUSUM-type statistic with Gaussian multiplier bootstrap calibration:
- CUSUM process: For , 0.
- Bootstrap: With i.i.d. 1, 2 is formed by randomly weighting sample deviations, generating a bootstrap distribution for the maximum CUSUM norm.
- Size control and minimax power: The Kolmogorov distance between the true and bootstrap sampling distributions is of order 3. Size validity and minimax separation for sparse alternatives are both shown; performance is uniform across a wide array of dependence structures (Yu et al., 2017).
- Extensions: Recursive, bootstrap-assisted binary segmentation (BABS) enables consistent multiple change-point identification.
3.3 Panel and Nonparametric Settings
For panel data and distributional change, nonparametric CUSUM-type statistics based on empirical distribution function differences are deployed, with bootstrap resampling (over units within time points) to calibrate test thresholds (Pommeret et al., 2011). Consistency and distribution-freeness are maintained under monotone linking and mild regularity conditions.
3.4 Regression and Weak Dependence: Wild Bootstrap
CUSUM statistics formed from sequential, marked residual processes under nonparametric or weakly dependent regression models use wild bootstrap (resampling residuals with random multipliers) to capture null limiting behavior, even under heteroskedasticity or time-varying variance. Strong consistency and asymptotic exactness of bootstrap-calibrated tests are established (Mohr et al., 2019).
3.5 Functional Data
In functional time series, the Functional Sieve Bootstrap (FSB) combines FPCA, VAR modeling of functional principal component scores, and bootstrap resampling of score and remainder processes. The FSB replicates the law of the partial-sum process and facilitates accurate, data-driven critical value computation for fully functional CUSUM statistics under very general weak dependence (Paparoditis et al., 2024).
4. Theoretical Properties and Regularity Assumptions
All major variants establish rigorous guarantees:
| Method | Key Theoretical Guarantees | Regularity Conditions |
|---|---|---|
| Conditional Bootstrap CUSUM (0906.1421) | ARL consistency for any 4 | Continuity at 5; reliable 6 |
| High-d CUSUM (Yu et al., 2017) | Uniform size validity, minimax detection boundary | Mild moments, possible dependence |
| Panel CUSUM (Pommeret et al., 2011) | Asymptotic distribution-free, size | Bounded densities; i.i.d. panels |
| Regression CUSUM (Mohr et al., 2019) | Size control, consistency, adapts to heteroskedasticity | α-mixing, moment and smoothness |
| Functional FSB-CUSUM (Paparoditis et al., 2024) | Consistent, distribution-free limit for partial sums | Weak stationarity, L⁴–M–approximable |
A core outcome is that under the respective nulls (and prescribed regularity), the bootstrap-generated reference distribution aligns with the true sampling law—often a variant of a Brownian bridge or tied-down Gaussian process—thereby preserving target error rates.
5. Empirical Performance and Applications
Robust empirical findings (via extensive simulations) confirm that:
- Bootstrap CUSUMs provide nominal size and accurate average run lengths under a variety of in-control distributions, far outperforming classical parametric or plug-in critical value methods when model assumptions fail (0906.1421, Yu et al., 2017).
- Detection power is maintained against a variety of alternatives (abrupt jumps, smooth drifts, variance shifts), often matching or exceeding that of plug-in or nonparametric competitors, especially in moderate to large samples (Pommeret et al., 2011, Mohr et al., 2019, Paparoditis et al., 2024).
- The procedures remain feasible computationally in high dimensions and for functional data by leveraging efficient resampling and data reduction strategies (e.g., FPCA, kernel-based statistics).
- Recommended practice involves sample sizes 7 (ideally 8–9) and bootstrap sizes 0 for finite-sample reliability.
6. Extensions and Practical Recommendations
Bootstrap-based, distribution-free CUSUM variants extend naturally to multiple change-point settings (via binary segmentation or related recursive schemes), robust U-statistic frameworks for location parameters (Yu et al., 2019), nonparametric regression, and beyond.
Practical considerations include:
- Calibration of tuning parameters (e.g., bootstrap repetition 1, maximum sprint 2, FPCA dimension 3, and VAR order 4), with cross-validation or pilot studies commonly used.
- Selection of marking or weighting functions to match anticipated forms of change.
- Caution with extreme parameter settings (e.g., reference values in CUSUM) and adequacy of initial sample sizes for density estimation or regression fit.
- Pre-whitening for autocorrelation in process control charts, and careful grid or truncation settings in high dimensions.
7. Significance and Impact
By fusing CUSUM with bootstrap resampling, these methods realize uniformly valid, distribution-free testing and monitoring protocols that are robust to the structure and complexity of modern data. They have become foundational in industrial process control, econometrics, high-dimensional inference, and functional data analysis—offering principled tools for uncertainty quantification and adaptive change detection (0906.1421, Yu et al., 2017, Mohr et al., 2019, Paparoditis et al., 2024).