Papers
Topics
Authors
Recent
Search
2000 character limit reached

LOCO CRT: Conditional Randomization Test

Updated 8 March 2026
  • LOCO–CRT is a method for assessing conditional independence in the model-X framework by evaluating the impact of omitting individual covariates.
  • It computes predictive loss differences between full and leave-one-out models to yield robust, finite-sample valid p-values for each variable.
  • The approach reduces computational cost through model reuse and supports analytic solutions in Gaussian covariate cases, enhancing its practical applicability.

The leave-one-covariate-out conditional randomization test (LOCO–CRT) is a computationally efficient methodology for assessing conditional independence within the model-X framework. It tests the null hypothesis that a response variable YY is independent of a given covariate XjX_j conditional on all other covariates XjX_{-j}, under the assumption that the marginal distribution of the covariate vector XX is known or can be accurately sampled. LOCO–CRT yields valid pp-values useful for error-rate control in variable selection, minimizing algorithmic randomness and enabling practical application even in high-dimensional settings (Katsevich et al., 2020).

1. Statistical Formulation and Model-X Framework

The setup assumes i.i.d. samples (Xi,Yi)(X_i, Y_i), i=1,,ni=1,\ldots,n, with Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p and arbitrary YiY_i. The model-X assumption requires that the joint distribution PXP_X of XjX_j0 is fully known or directly sampleable, while XjX_j1 remains unrestricted. For each variable XjX_j2, the test targets

XjX_j3

where XjX_j4 denotes all features except XjX_j5. This hypothesis asserts that, conditional on XjX_j6, XjX_j7 carries no information regarding XjX_j8.

2. LOCO Test Statistic

Given a loss function XjX_j9, e.g., squared loss for regression or logistic loss for classification, define:

  • XjX_{-j}0: any fitted predictor trained on XjX_{-j}1,
  • XjX_{-j}2: the same model class fitted using XjX_{-j}3.

The observed LOCO statistic for coordinate XjX_{-j}4 is: XjX_{-j}5 XjX_{-j}6 quantifies the change in predictive loss incurred by omitting XjX_{-j}7. Under XjX_{-j}8, XjX_{-j}9 is expected to be small; under the alternative, it should be large.

3. Algorithmic Procedure and P-value Calculation

The LOCO–CRT algorithm uses null randomization to obtain a valid XX0-value for each variable. For each XX1:

  1. Fit both XX2 (on XX3) and XX4 (on XX5).
  2. Compute XX6 as the average loss difference.
  3. For XX7:
    • For each XX8, sample XX9.
    • Construct pp0.
    • Compute pp1, the analogous test statistic using the null-resampled pp2.
  4. Calculate

pp3

Because pp4 are exchangeable under pp5, pp6 is valid in finite samples. For simultaneous inference across coordinates, conventional multiplicity corrections (Bonferroni, Holm, Benjamini–Hochberg) can be applied.

4. Theoretical Guarantees

The principal theoretical result is finite-sample validity of LOCO–CRT pp7-values: pp8 given correct randomization from pp9 and i.i.d. sampling. Under the null, (Xi,Yi)(X_i, Y_i)0 is a super-uniform (Xi,Yi)(X_i, Y_i)1-value. Familywise error rate (FWER) can be controlled at level (Xi,Yi)(X_i, Y_i)2 by rejecting all (Xi,Yi)(X_i, Y_i)3 with (Xi,Yi)(X_i, Y_i)4. The computational efficiency is achieved by fitting (Xi,Yi)(X_i, Y_i)5 and (Xi,Yi)(X_i, Y_i)6 only once per variable; null sampling changes only (Xi,Yi)(X_i, Y_i)7, not the fitted models.

5. L1ME–CRT Variant for L1-regularized M-Estimators

For L1-regularized estimators (e.g., Lasso, elastic net), refitting after each variable exclusion is computationally intensive. The L1ME–CRT modification capitalizes on the empirical observation that, for coordinates (Xi,Yi)(X_i, Y_i)8 with (Xi,Yi)(X_i, Y_i)9 in the full-data fit, the cross-validated penalty parameter i=1,,ni=1,\ldots,n0 is typically stable after exclusion, under restricted eigenvalue and Lipschitz loss conditions: i=1,,ni=1,\ldots,n1 Hence, for the “inactive” set i=1,,ni=1,\ldots,n2, the same i=1,,ni=1,\ldots,n3 can be reused for i=1,,ni=1,\ldots,n4, obviating additional cross-validations. This reduces computational overhead to near the number of “active” variables, i=1,,ni=1,\ldots,n5 in sparse regimes.

6. Closed-form Solution in the Multivariate Gaussian Covariate Case

Assuming i=1,,ni=1,\ldots,n6, the conditional law

i=1,,ni=1,\ldots,n7

with i=1,,ni=1,\ldots,n8, enables analytic computation. With squared error loss and ordinary least squares,

i=1,,ni=1,\ldots,n9

where Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p0 and Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p1 are fitted values excluding and including Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p2. Under Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p3, Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p4 follows a Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p5 distribution, and the Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p6-value is given by

Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p7

recovering the classical partial Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p8-test or Xi=(Xi1,,Xip)RpX_i = (X_{i1}, \dots, X_{ip}) \in \mathbb{R}^p9-test for a single coefficient in normal linear regression, with no need for Monte Carlo.

7. Computational and Practical Considerations

Let YiY_i0 denote the cost of fitting a model and YiY_i1 the cost of scoring YiY_i2 samples:

  • A single LOCO–CRT test requires YiY_i3.
  • Testing all YiY_i4 features costs YiY_i5, but with L1ME–CRT, the actual model refits may be far fewer than YiY_i6 due to reuse for inactive features.

Typically, YiY_i7 is YiY_i8 for OLS or YiY_i9 for Lasso; PXP_X0 is PXP_X1. Monte Carlo sample sizes PXP_X2–2000 are common, balancing PXP_X3-value granularity and computational burden.

Implementation suggestions include precomputing random seeds for reproducibility, vectorizing null-feature sampling when feasible, and using persistent model objects in languages like R or Python to avoid redundant refitting. These practicalities further enhance runtime efficiency.


For foundational and related methodologies, see "Panning for gold: 'model-X' knockoffs for high-dimensional controlled variable selection” (Candès, Fan, Janson & Lv), “Gene hunting with hidden Markov model knockoffs” (Sesia, Candès & Sabatti), and “Multiple testing with the conditional randomization test” (Li & Barber) in addition to the primary development of LOCO–CRT (Katsevich et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LOCO Conditional Randomization Test (LOCO CRT).