One-Covariate-at-a-Time Testing
- OCMT is a framework for testing each variable individually using auxiliary information while rigorously controlling error rates.
- It employs techniques like linear regression, nonparametric splines, and covariate weighting to enhance signal discovery in high-dimensional data.
- OCMT offers robust theoretical guarantees and computational scalability, making it effective across fields like genomics, economics, and time series analysis.
One-Covariate-at-a-Time Multiple Testing (OCMT) refers to a broad class of methodologies for variable selection, signal discovery, and error-controlled inference in high-dimensional settings. The central principle is to evaluate each candidate variable or hypothesis individually, often leveraging auxiliary covariate information or model structures, while controlling false discovery or family-wise error across the entire set of tests. OCMT has been developed and implemented across diverse domains, including high-dimensional time series, biology, nonparametric regression, and statistical genetics, with rigorous theoretical guarantees, extensive simulation validation, and demonstrable empirical utility.
1. Model Setups and Key Problem Formulations
OCMT addresses multivariate testing problems where, in addition to the main outcome or response , there exists a high-dimensional covariate set or a large collection of hypothesis tests with associated side-information. The following formulations typify OCMT approaches:
- Linear regression with sparse signal: The target is modeled as
where are pre-selected controls and is a large active set. Each is tested individually, seeking covariates with (Pesaran et al., 2024, Chudik et al., 2023).
- Nonparametric additive models: The outcome is expressed as
with each subject to a null hypothesis , tested one at a time by projecting onto a spline or sieve basis (Su et al., 2022).
- Large-scale simultaneous testing with covariate weighting: For hypotheses with test statistics and a side covariate , the goal is to construct test-specific or group-specific weights that maximize power while preserving FDR or FWER control (Hasan et al., 2022, Ignatiadis et al., 2017).
2. One-at-a-Time Testing Algorithms
OCMT typically employs (or ) separate regression or testing problems, each including only a single hypothesis or covariate of interest, potentially adjusted for select controls:
- Linear regression variable selection: For each , fit
and compute the associated t-statistic . A Bonferroni-style threshold is adjusted to maintain FWER at level , via
with the nominal per-comparison size and (Pesaran et al., 2024, Chudik et al., 2023).
- Nonparametric OCMT: For each , regress on a B-spline basis , compute
and reject if exceeds a BIC-minimized cutoff (Su et al., 2022).
- Covariate-weighted testing: With side-information , assign a rank or bin, compute weights via optimality conditions or data-driven procedures, normalize , and reject if (for FWER) or via a weighted-BH procedure for FDR (Hasan et al., 2022, Ignatiadis et al., 2017).
- One-at-a-Time Knockoff Construction: For linear regression, for each construct a knockoff with matching Gram constraints, calculate an anti-symmetric statistic comparing variable and its knockoff, and threshold to control FDR (Guan et al., 26 Feb 2025).
| OCMT Variant | Error Type Controlled | Core Statistic |
|---|---|---|
| Linear OCMT | FWER | t-stat with Bonferroni-type threshold |
| Nonparametric OCMT | FDR/FWER | B-spline quadratic form |
| Covariate-weighted BH | FDR | Weighted p-values, cross-weighted BH |
| Knockoff OCMT (OATK) | FDR | Knockoff-vs-original coefficient statistic |
3. Theoretical Guarantees and Assumptions
Theoretical properties of OCMT algorithms hinge on specific model regularity and effect size conditions:
- Error rate control: If the nonzero coefficients are strong enough (min-signal), and the number of tests grows slowly (e.g., ), OCMT ensures asymptotic FWER or FDR control. For example,
and
(Pesaran et al., 2024, Chudik et al., 2023).
- Type I/II error tradeoff: One-stage OCMT achieves high true-positive rate and low false-positive rate if all signals are strong. Multi-stage methods recover "hidden signals" (weak marginals) by partialling out previously selected features (Su et al., 2022).
- Post-selection guarantees: Oracle properties for estimation and fit hold if the post-selection regression is performed on the set of selected variables. For linear OCMT, in-sample residual sum of squares converges to that of the oracle model (Chudik et al., 2023).
- Assumptions: Standard conditions include independent or weakly dependent observations, bounded fourth moments, and effect-size lower bounds. For knockoff and covariate-weighted variants, extra conditions like Gram-matrix invertibility and null-covariate independence are required (Guan et al., 26 Feb 2025, Ignatiadis et al., 2017).
4. Implementation Issues and Computational Considerations
OCMT lends itself to scalable computation owing to its sample-splitting and coordinate-wise structure:
- No cross-validation: Once significance level and tuning parameters are chosen, no cross-validation or hyperparameter search is needed (unlike Lasso) (Pesaran et al., 2024).
- Complexity: For covariates and sample size , the cost is naively or with updating. For nonparametric OCMT with B-spline basis dimension , the leading complexity is . Advanced methods (cross-weighted, covariate-powered) require per fold for group/BH variants (Pesaran et al., 2024, Su et al., 2022, Ignatiadis et al., 2017).
- Practical guidelines: Known strong predictors should always be included in the control vector. For time series or models with correlation, increasing the Bonferroni correction parameter provides robustness (Pesaran et al., 2024).
5. Connections to Related Multiple Testing Frameworks
OCMT unifies and extends several influential approaches:
- Lasso and penalized regression: Both OCMT and Lasso operate under sparsity. OCMT uses inferential thresholds, does not require the irrepresentable condition, and avoids tuning instability associated with cross-validation (Pesaran et al., 2024).
- Covariate and weight-based FDR control: Covariate weighting—using side-information to increase power—is formalized via cross-weighted procedures (e.g., IHW), optimal rank-based weighting (CRW), and rank-adaptive thresholds. Empirical power increases by 10-fold are observed in sparse, weak-signal scenarios typical of high-throughput biology (Hasan et al., 2022, Ignatiadis et al., 2017).
- Knockoff filters: One-at-a-time knockoff procedures (OATK) reduce the overhead of constructing joint knockoff matrices and offer substantial power and efficiency gains with theoretical FDR control under weak dependence, generalizing the conditional randomization test for fixed-design regression (Guan et al., 26 Feb 2025).
6. Empirical Results and Domain-Specific Applications
OCMT's efficacy is evidenced by comparisons in macroeconomic forecasting, molecular biology, and social science modeling:
- UK inflation forecasting (2020q1–2023q1): OCMT with ARX controls produced RMSFEs matching or improving over ARX and outperforming Lasso, particularly at longer horizons (Pesaran et al., 2024).
- Nonparametric regression (Chinese migration): Multi-stage OCMT plus post-adaptive group Lasso selected parsimonious predictors with lowest RMSFE in 76–77% of simulations (Su et al., 2022).
- High-throughput biology: In RNA-Seq data, covariate rank weighting (CRW) increased discoveries by >50% over group-based or unweighted procedures, with empirical FDR controlled at nominal levels (Hasan et al., 2022).
- Variable selection with parameter instability: OCMT maintained minimal mean squared forecast error and noise elimination, outperforming Lasso, A-Lasso, and boosting in both simulation and empirical studies (Chudik et al., 2023).
- Knockoff power comparisons: OATK achieved higher or comparable FDR and up to 30-point higher power versus standard knockoff and Benjamini–Hochberg in simulation benchmarks (Guan et al., 26 Feb 2025).
7. Limitations and Extensions
While OCMT offers attractive properties, several limitations and frontiers merit note:
- Sensitivity to weak signals: OCMT power may degrade if all signals are near the noise threshold; multi-stage or joint inference approaches may be needed (Su et al., 2022, Pesaran et al., 2024).
- Hidden/joint effects: Sequential one-at-a-time testing can miss variables that are only influential jointly; augmenting OCMT with a second-stage penalized or group-based selection is advised in such scenarios (Pesaran et al., 2024, Su et al., 2022).
- Structural model assumptions: Extensions to dependent data (time series, clustering), multivariate or interactive effects, and alternate null distributional forms are ongoing areas of research (Su et al., 2022, Hasan et al., 2022).
- Practical tuning: Empirical error control is sensitive to effect-size estimation, rank-modeling robustness, and, for adaptive weighting, the integrity of fold-independence (Hasan et al., 2022, Ignatiadis et al., 2017).
OCMT, as a flexible paradigm, bridges inferential transparency, rigorous error control, and broad applicability, making it a core component in the contemporary high-dimensional inference toolkit across econometrics, machine learning, and genomics.