Papers
Topics
Authors
Recent
2000 character limit reached

Leave-One-Covariate-Out (LOCO)

Updated 3 October 2025
  • LOCO is a variable importance measure defined by the change in model error when a covariate is omitted, linking predictive behavior to statistical inference.
  • It underpins rigorous hypothesis testing and variable screening in regression and machine learning by comparing full versus reduced models.
  • Extensions like normalized and decorrelated LOCO address collinearity and enable applications in high-dimensional, nonparametric, and model-agnostic frameworks.

The Leave-One-Covariate-Out (LOCO) methodology signifies a class of variable importance measures and inferential procedures in which the role of an individual covariate (or group of covariates) is ascertained by comparing the predictive behavior or statistical fit of a full model with that of a reduced model in which the target covariate(s) are omitted, marginalized, or otherwise ablated. This approach, widely used in regression analysis and model interpretation, has formal connections to classical hypothesis testing, modern model-agnostic variable importance methods, and functional ANOVA decomposition. LOCO-type statistics feature prominently in high-dimensional inference, model-X conditional independence testing, and distribution-free measures of both main and interaction effects, with rigorous studies elucidating their properties, limitations, and computational aspects (Verdinelli et al., 2023, Cao et al., 2020, Bladen et al., 1 Oct 2025, Little et al., 10 Feb 2025).

1. Foundational Formulation and Mathematical Properties

In the canonical regression setting, for a model μ(x)=E[YX=x]\mu(x) = \mathbb{E}[Y|X=x] and data X=(X1,...,Xd)X = (X_1, ..., X_d), the LOCO importance of feature jj is formally defined as the increase in mean squared prediction error when XjX_j is omitted: ψloco(j)=E[(Yμj(Xj))2]E[(Yμ(X))2]=E[(μ(X)μj(Xj))2]\psi_{\mathrm{loco}}(j) = \mathbb{E}[(Y - \mu_{-j}(X_{-j}))^2] - \mathbb{E}[(Y - \mu(X))^2] = \mathbb{E}\left[(\mu(X) - \mu_{-j}(X_{-j}))^2\right] where μj(Xj)=E[YXj]\mu_{-j}(X_{-j}) = \mathbb{E}[Y|X_{-j}] and XjX_{-j} denotes all predictors except XjX_j (Verdinelli et al., 2023). For a set S{1,...,d}S\subset \{1,...,d\}, the generalization is

ψloco(S)=E[(μ(X)μS(XS))2].\psi_{\mathrm{loco}}(S) = \mathbb{E}\left[(\mu(X) - \mu_{-S}(X_{-S}))^2\right].

At the sample level, plug-in LOCO estimators are typically computed by retraining or re-estimating the model without the covariate(s) of interest and measuring the change in predictive error on a valid test set (Zheng et al., 19 Aug 2025).

In linear regression, with a model Y=Xβ+ϵY = X\beta + \epsilon: ψloco(j)=βj2E[(Xjνj(X))2]\psi_{\mathrm{loco}}(j) = \beta_j^2\, \mathbb{E}\left[(X_j - \nu_j(X))^2\right] where νj(X)=E[XjXj]\nu_j(X)=\mathbb{E}[X_j | X_{-j}] (Verdinelli et al., 2023, Verdinelli et al., 2021). When XjX_j is highly collinear (i.e., nearly determined by XjX_{-j}), E[(Xjνj(X))2]0E[(X_j - \nu_j(X))^2] \approx 0, and thus the LOCO value is near zero even in the presence of a large βj\beta_j.

2. LOCO for Inference: Testing, p-values, and Confidence Intervals

In high-dimensional regression and modern conditional independence testing frameworks, LOCO-type statistics facilitate both variable screening and statistical hypothesis testing. In the LOCO Conditional Randomization Test (LOCO CRT) (Katsevich et al., 2020), the significance of a covariate is assessed by comparing the loss (or risk) of the full model with that of the LOCO-reduced model. The sampling distribution of this loss difference is approximated—either exactly (for Gaussian covariates, yielding closed forms) or via resampling—allowing the construction of valid pp‑values for familywise error control.

LOCO-based inference has also been formalized for L1-regularized M-estimators, with variants such as L1ME CRT providing computational efficiency in high-dimensional settings by leveraging the stability properties of cross-validated lasso (Katsevich et al., 2020).

Recent advances extend LOCO to credible intervals for feature importance and even for higher-order interactions. The iLOCO metric for feature interactions, defined as

iLOCOj,k=Δj+ΔkΔj,k\text{iLOCO}_{j,k} = \Delta_j + \Delta_k - \Delta_{j,k}

with Δj\Delta_j the main effect and Δj,k\Delta_{j,k} the error change for joint removal, supports distribution-free confidence intervals via either a central limit theorem for split-sample estimators or minipatch ensembles (Little et al., 10 Feb 2025).

3. LOCO in High-Dimensional and Machine Learning Contexts

LOCO-based approaches are prominent in high-dimensional regression, where standard inferential tools fail. For example, in the LASSO context, the change in the solution path when one covariate is dropped—quantified by norms such as Tj(s,t)=β^()β^(j)()s,tT_j(s,t) = \|\hat\beta(\cdot) - \hat\beta^{(-j)}(\cdot)\|_{s,t} for varying penalty λ\lambda—enables both variable importance ranking and hypothesis testing, often attaining higher power than traditional methods such as the tt-test, and maintaining type-I error control across correlation regimes (Cao et al., 2020).

Model-agnostic wrappers for feature selection using LOCO are computationally intensive but broadly applicable, especially when coupled with minipatch ensembles or out-of-bag approaches (which partially obviate retraining) (Little et al., 10 Feb 2025). In modern machine learning methods such as neural networks or gradient boosting trees, LOCO provides a means to assess feature necessity, though computational efficiency concerns often require approximate strategies (Zheng et al., 19 Aug 2025).

A common population estimator for LOCO importance is

ψ^0,j(loco)=1ni=1n[(Yifn(Xi))2(Yifn,j(Xi,j))2]\hat{\psi}_{0,j}^{\mathrm{(loco)}} = \frac{1}{n}\sum_{i=1}^n \left[ (Y_i - f_n(X_i))^2 - (Y_i - f_{n, -j}(X_{i, -j}))^2 \right]

with asymptotic normality achieved under mild smoothness and moment conditions (Zheng et al., 19 Aug 2025).

4. Influence of Collinearity and Correlation on LOCO

LOCO measures are inherently sensitive to collinearity. The effect of correlation is captured explicitly in theoretical work (Bladen et al., 1 Oct 2025). In a linear model, if X=ZA\mathbf{X} = \mathbf{Z} \mathbf{A} for latent Z\mathbf{Z} and mixing matrix A=ΔJ+(1Δ)I\mathbf{A} = \Delta \mathbf{J} + (1-\Delta)\mathbf{I}, the closed-form LOCO importance is

LOCOi=βi(1Δ)1+c\mathrm{LOCO}_i = \beta_i (1-\Delta) \sqrt{1+c}

where cc is a function of Δ\Delta and pp (the number of predictors). Increasing Δ\Delta (stronger collinearity) results in substantial attenuation of LOCO scores, reflecting the redundancy of information among predictors: with Δ1\Delta \to 1, LOCOi0\mathrm{LOCO}_i\to 0 even for nonzero βi\beta_i.

This contrasts with permutation measures ("Permute-and-Predict"), which remain proportional to βi\beta_i and predictor variance and remain largely unaffected by collinearity. LOCO therefore measures the necessity of a feature in the context of the rest, penalizing shared information (Bladen et al., 1 Oct 2025, Verdinelli et al., 2021, Verdinelli et al., 2023).

Strategies to mitigate or interpret LOCO's susceptibility to correlation include normalization (dividing by E[(Xjνj(X))2]E[(X_j-\nu_j(X))^2] to recover βj2\beta_j^2 in the linear case), decorrelation (recomputing under an altered, independent joint distribution), or semiparametric "decorrelated" variable importance functionals using influence functions (Verdinelli et al., 2021, Verdinelli et al., 2023).

5. Model-Agnostic and Nonparametric Extensions

LOCO is implementable for black box predictors, including random forests, neural networks, and ensemble methods. The core principle remains retraining the predictive function with/without the covariate and comparing target error metrics (MSE, classification error, or others), often using out-of-bag predictions to economize computation (Zheng et al., 19 Aug 2025, Little et al., 10 Feb 2025).

Comprehensive LOCO-type measures encompass:

Nonparametric estimation is feasible but challenging due to the complexity of marginalization and the curse of dimensionality; semiparametric and cross-fitting approaches with efficient influence function corrections have been developed to offset such obstacles (Verdinelli et al., 2021).

6. LOCO Versus Shapley Values and Other Measures

LOCO and Shapley values both operationalize feature importance via comparisons over model subspaces. The Shapley value aggregates marginal contributions over all possible subsets, whereas basic LOCO measures the marginal impact from omitting the variable from the full set (Verdinelli et al., 2023). Shapley values are theoretically appealing due to properties such as efficiency and symmetry but are computationally demanding and, notably, do not eliminate correlation effects. Both measures may assign small or distorted importance to highly correlated features. Modified LOCO methods (e.g., normalized or decorrelated LOCO) more directly address correlation distortion in linear models and satisfy tailored axioms such as "functional dependence," "correlation-free," and "linear agreement" (Verdinelli et al., 2023).

A trade-off emerges: stricter corrections for correlation bias may introduce first-order bias due to extrapolation in regions of low data density. The consensus from recent surveys is to prefer a normalized LOCO for its computational ease, direct interpretability, and agreement with classical regression coefficients, with the caveat that residual correlation effects persist in nonlinear or highly collinear settings (Verdinelli et al., 2023, Verdinelli et al., 2021).

7. Extensions: Interactions, Efficiency, and Future Directions

Recent research generalizes LOCO to:

  • Interaction importance (iLOCO) quantifying the incremental loss from conjointly removing multiple covariates—coherently isolating pairwise or higher-order effects through inclusion–exclusion principles, and furnishing distribution-free confidence intervals via subsampling ensembles or sample splitting (Little et al., 10 Feb 2025).
  • Relative efficiency studies, where GCM-based selectors typically outperform LOCO in terms of standard error to effect-size ratio under regularity, except in cases where GCM's signal is nullified (e.g., by symmetry conditions) and LOCO then becomes advantageous (Zheng et al., 19 Aug 2025).
  • Theoretical connections between LOCO, tt-statistics, and influence measures; in linear regression, LOCOi\mathrm{LOCO}_i can be proportional to βi/se(β^i)|\beta_i / \mathrm{se}(\hat{\beta}_i)| (Bladen et al., 1 Oct 2025).

Continued research is directed at scalable algorithms for LOCO in large-scale black-box models, further decorrelation strategies, robust inference in distribution-shifted environments, and the development of principled alternatives balancing interpretability, computational tractability, and resistance to confounding.


Summary Table: Core LOCO Formulations and Targets

LOCO Variant Population Parameter Interpretation
Main effect (standard LOCO) ψloco(j)=E[(μ(X)μj(X))2]\psi_{\mathrm{loco}}(j) = \mathbb{E}[(\mu(X)-\mu_{-j}(X))^2] Increase in MSE by dropping XjX_j
Normalized LOCO (linear model; correlation-corrected) ψloco(j)/E[(Xjνj(X))2]\psi_{\mathrm{loco}}(j) / \mathbb{E}[(X_j-\nu_j(X))^2] Recovers βj2\beta_j^2
iLOCO (interaction effect, pairwise) ψj+ψkψj,k\psi_j + \psi_k - \psi_{j,k} Isolates pairwise interaction
Decorrelated LOCO ψD(j)=E[(μ(x)μj(x))2]\psi_D(j) = \mathbb{E}[(\mu(x) - \mu_{-j}(x))^2] under independence Removes correlation bias
LOCO LASSO path statistic Tj(s,t)=β^()β^(j)()s,tT_j(s,t) = \|\hat\beta(\cdot) - \hat\beta^{(-j)}(\cdot)\|_{s,t} Change in solution path by dropping covariate
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Leave-One-Covariate-Out (LOCO).