Leave-One-Covariate-Out (LOCO)

Updated 3 October 2025

LOCO is a variable importance measure defined by the change in model error when a covariate is omitted, linking predictive behavior to statistical inference.
It underpins rigorous hypothesis testing and variable screening in regression and machine learning by comparing full versus reduced models.
Extensions like normalized and decorrelated LOCO address collinearity and enable applications in high-dimensional, nonparametric, and model-agnostic frameworks.

The Leave-One-Covariate-Out (LOCO) methodology signifies a class of variable importance measures and inferential procedures in which the role of an individual covariate (or group of covariates) is ascertained by comparing the predictive behavior or statistical fit of a full model with that of a reduced model in which the target covariate(s) are omitted, marginalized, or otherwise ablated. This approach, widely used in regression analysis and model interpretation, has formal connections to classical hypothesis testing, modern model-agnostic variable importance methods, and functional ANOVA decomposition. LOCO-type statistics feature prominently in high-dimensional inference, model-X conditional independence testing, and distribution-free measures of both main and interaction effects, with rigorous studies elucidating their properties, limitations, and computational aspects (Verdinelli et al., 2023, Cao et al., 2020, Bladen et al., 1 Oct 2025, Little et al., 10 Feb 2025).

1. Foundational Formulation and Mathematical Properties

In the canonical regression setting, for a model $\mu(x) = \mathbb{E}[Y|X=x]$ and data $X = (X_1, ..., X_d)$ , the LOCO importance of feature $j$ is formally defined as the increase in mean squared prediction error when $X_j$ is omitted: $\psi_{\mathrm{loco}}(j) = \mathbb{E}[(Y - \mu_{-j}(X_{-j}))^2] - \mathbb{E}[(Y - \mu(X))^2] = \mathbb{E}\left[(\mu(X) - \mu_{-j}(X_{-j}))^2\right]$ where $\mu_{-j}(X_{-j}) = \mathbb{E}[Y|X_{-j}]$ and $X_{-j}$ denotes all predictors except $X_j$ (Verdinelli et al., 2023). For a set $S\subset \{1,...,d\}$ , the generalization is

$\psi_{\mathrm{loco}}(S) = \mathbb{E}\left[(\mu(X) - \mu_{-S}(X_{-S}))^2\right].$

At the sample level, plug-in LOCO estimators are typically computed by retraining or re-estimating the model without the covariate(s) of interest and measuring the change in predictive error on a valid test set (Zheng et al., 19 Aug 2025).

In linear regression, with a model $Y = X\beta + \epsilon$ : $\psi_{\mathrm{loco}}(j) = \beta_j^2\, \mathbb{E}\left[(X_j - \nu_j(X))^2\right]$ where $\nu_j(X)=\mathbb{E}[X_j | X_{-j}]$ (Verdinelli et al., 2023, Verdinelli et al., 2021). When $X_j$ is highly collinear (i.e., nearly determined by $X_{-j}$ ), $E[(X_j - \nu_j(X))^2] \approx 0$ , and thus the LOCO value is near zero even in the presence of a large $\beta_j$ .

2. LOCO for Inference: Testing, p-values, and Confidence Intervals

In high-dimensional regression and modern conditional independence testing frameworks, LOCO-type statistics facilitate both variable screening and statistical hypothesis testing. In the LOCO Conditional Randomization Test (LOCO CRT) (Katsevich et al., 2020), the significance of a covariate is assessed by comparing the loss (or risk) of the full model with that of the LOCO-reduced model. The sampling distribution of this loss difference is approximated—either exactly (for Gaussian covariates, yielding closed forms) or via resampling—allowing the construction of valid $p$ ‑values for familywise error control.

LOCO-based inference has also been formalized for L1-regularized M-estimators, with variants such as L1ME CRT providing computational efficiency in high-dimensional settings by leveraging the stability properties of cross-validated lasso (Katsevich et al., 2020).

Recent advances extend LOCO to credible intervals for feature importance and even for higher-order interactions. The iLOCO metric for feature interactions, defined as

$\text{iLOCO}_{j,k} = \Delta_j + \Delta_k - \Delta_{j,k}$

with $\Delta_j$ the main effect and $\Delta_{j,k}$ the error change for joint removal, supports distribution-free confidence intervals via either a central limit theorem for split-sample estimators or minipatch ensembles (Little et al., 10 Feb 2025).

3. LOCO in High-Dimensional and Machine Learning Contexts

LOCO-based approaches are prominent in high-dimensional regression, where standard inferential tools fail. For example, in the LASSO context, the change in the solution path when one covariate is dropped—quantified by norms such as $T_j(s,t) = \|\hat\beta(\cdot) - \hat\beta^{(-j)}(\cdot)\|_{s,t}$ for varying penalty $\lambda$ —enables both variable importance ranking and hypothesis testing, often attaining higher power than traditional methods such as the $t$ -test, and maintaining type-I error control across correlation regimes (Cao et al., 2020).

Model-agnostic wrappers for feature selection using LOCO are computationally intensive but broadly applicable, especially when coupled with minipatch ensembles or out-of-bag approaches (which partially obviate retraining) (Little et al., 10 Feb 2025). In modern machine learning methods such as neural networks or gradient boosting trees, LOCO provides a means to assess feature necessity, though computational efficiency concerns often require approximate strategies (Zheng et al., 19 Aug 2025).

A common population estimator for LOCO importance is

$\hat{\psi}_{0,j}^{\mathrm{(loco)}} = \frac{1}{n}\sum_{i=1}^n \left[ (Y_i - f_n(X_i))^2 - (Y_i - f_{n, -j}(X_{i, -j}))^2 \right]$

with asymptotic normality achieved under mild smoothness and moment conditions (Zheng et al., 19 Aug 2025).

4. Influence of Collinearity and Correlation on LOCO

LOCO measures are inherently sensitive to collinearity. The effect of correlation is captured explicitly in theoretical work (Bladen et al., 1 Oct 2025). In a linear model, if $\mathbf{X} = \mathbf{Z} \mathbf{A}$ for latent $\mathbf{Z}$ and mixing matrix $\mathbf{A} = \Delta \mathbf{J} + (1-\Delta)\mathbf{I}$ , the closed-form LOCO importance is

$\mathrm{LOCO}_i = \beta_i (1-\Delta) \sqrt{1+c}$

where $c$ is a function of $\Delta$ and $p$ (the number of predictors). Increasing $\Delta$ (stronger collinearity) results in substantial attenuation of LOCO scores, reflecting the redundancy of information among predictors: with $\Delta \to 1$ , $\mathrm{LOCO}_i\to 0$ even for nonzero $\beta_i$ .

This contrasts with permutation measures ("Permute-and-Predict"), which remain proportional to $\beta_i$ and predictor variance and remain largely unaffected by collinearity. LOCO therefore measures the necessity of a feature in the context of the rest, penalizing shared information (Bladen et al., 1 Oct 2025, Verdinelli et al., 2021, Verdinelli et al., 2023).

Strategies to mitigate or interpret LOCO's susceptibility to correlation include normalization (dividing by $E[(X_j-\nu_j(X))^2]$ to recover $\beta_j^2$ in the linear case), decorrelation (recomputing under an altered, independent joint distribution), or semiparametric "decorrelated" variable importance functionals using influence functions (Verdinelli et al., 2021, Verdinelli et al., 2023).

5. Model-Agnostic and Nonparametric Extensions

LOCO is implementable for black box predictors, including random forests, neural networks, and ensemble methods. The core principle remains retraining the predictive function with/without the covariate and comparing target error metrics (MSE, classification error, or others), often using out-of-bag predictions to economize computation (Zheng et al., 19 Aug 2025, Little et al., 10 Feb 2025).

Comprehensive LOCO-type measures encompass:

Model-agnostic global feature importance (population-level $\psi_{\mathrm{loco}}(j)$ ) (Verdinelli et al., 2023),
Conditional independence testing within the model-X framework (LOCO CRT and L1ME CRT) (Katsevich et al., 2020),
Variable interaction strength via iLOCO ( $\text{iLOCO}_{j,k}$ , and higher-order analogues) (Little et al., 10 Feb 2025),
High-dimensional and penalized inference (LOCO LASSO path statistics) (Cao et al., 2020),
Efficiency comparisons with alternatives such as the Generalized Covariance Measure (GCM), where under regularity, GCM often exhibits better asymptotic efficiency (Zheng et al., 19 Aug 2025).

Nonparametric estimation is feasible but challenging due to the complexity of marginalization and the curse of dimensionality; semiparametric and cross-fitting approaches with efficient influence function corrections have been developed to offset such obstacles (Verdinelli et al., 2021).

6. LOCO Versus Shapley Values and Other Measures

LOCO and Shapley values both operationalize feature importance via comparisons over model subspaces. The Shapley value aggregates marginal contributions over all possible subsets, whereas basic LOCO measures the marginal impact from omitting the variable from the full set (Verdinelli et al., 2023). Shapley values are theoretically appealing due to properties such as efficiency and symmetry but are computationally demanding and, notably, do not eliminate correlation effects. Both measures may assign small or distorted importance to highly correlated features. Modified LOCO methods (e.g., normalized or decorrelated LOCO) more directly address correlation distortion in linear models and satisfy tailored axioms such as "functional dependence," "correlation-free," and "linear agreement" (Verdinelli et al., 2023).

A trade-off emerges: stricter corrections for correlation bias may introduce first-order bias due to extrapolation in regions of low data density. The consensus from recent surveys is to prefer a normalized LOCO for its computational ease, direct interpretability, and agreement with classical regression coefficients, with the caveat that residual correlation effects persist in nonlinear or highly collinear settings (Verdinelli et al., 2023, Verdinelli et al., 2021).

7. Extensions: Interactions, Efficiency, and Future Directions

Recent research generalizes LOCO to:

Interaction importance (iLOCO) quantifying the incremental loss from conjointly removing multiple covariates—coherently isolating pairwise or higher-order effects through inclusion–exclusion principles, and furnishing distribution-free confidence intervals via subsampling ensembles or sample splitting (Little et al., 10 Feb 2025).
Relative efficiency studies, where GCM-based selectors typically outperform LOCO in terms of standard error to effect-size ratio under regularity, except in cases where GCM's signal is nullified (e.g., by symmetry conditions) and LOCO then becomes advantageous (Zheng et al., 19 Aug 2025).
Theoretical connections between LOCO, $t$ -statistics, and influence measures; in linear regression, $\mathrm{LOCO}_i$ can be proportional to $|\beta_i / \mathrm{se}(\hat{\beta}_i)|$ (Bladen et al., 1 Oct 2025).

Continued research is directed at scalable algorithms for LOCO in large-scale black-box models, further decorrelation strategies, robust inference in distribution-shifted environments, and the development of principled alternatives balancing interpretability, computational tractability, and resistance to confounding.

Summary Table: Core LOCO Formulations and Targets

LOCO Variant	Population Parameter	Interpretation
Main effect (standard LOCO)	$\psi_{\mathrm{loco}}(j) = \mathbb{E}[(\mu(X)-\mu_{-j}(X))^2]$	Increase in MSE by dropping $X_j$
Normalized LOCO (linear model; correlation-corrected)	$\psi_{\mathrm{loco}}(j) / \mathbb{E}[(X_j-\nu_j(X))^2]$	Recovers $\beta_j^2$
iLOCO (interaction effect, pairwise)	$\psi_j + \psi_k - \psi_{j,k}$	Isolates pairwise interaction
Decorrelated LOCO	$\psi_D(j) = \mathbb{E}[(\mu(x) - \mu_{-j}(x))^2]$ under independence	Removes correlation bias
LOCO LASSO path statistic	$T_j(s,t) = \\|\hat\beta(\cdot) - \hat\beta^{(-j)}(\cdot)\\|_{s,t}$	Change in solution path by dropping covariate