Papers
Topics
Authors
Recent
Search
2000 character limit reached

LOCO Estimation: Feature Importance & Inference

Updated 21 August 2025
  • LOCO Estimation is a model-agnostic method for quantifying feature importance by measuring the change in predictive performance when a feature is removed.
  • It provides robust predictive inference in high-dimensional settings through techniques like leave-one-out residuals and penalized estimator comparisons.
  • Extensions such as interaction LOCO and conditional tests enhance variable selection and control error rates in complex modeling scenarios.

Leave-One-Covariate-Out (LOCO) Estimation is a statistical and machine learning methodology that quantifies the importance of a covariate (feature) by assessing the change in a predictive model’s performance when that covariate is removed. This wrapper-type approach is notable for its model-agnostic nature, robust inferential properties in high-dimensional contexts, and extensibility to interaction and conditional inference. The LOCO principle underlies recent developments in predictive interval construction, causal inference, randomized experiments, conditional independence testing, and variable selection in both low- and high-dimensional regimes.

1. Mathematical Definition and Core Principles

LOCO estimation procedures compare predictive performance metrics between a model fit on the full covariate set and models fit after excluding one or more covariates. For regression problems, a canonical LOCO importance parameter for feature XjX_j is defined as

ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})

where f0f_0 is the full model predictor, f0,−jf_{0,-j} is the model learned without XjX_j, and V(⋅,⋅)V(\cdot, \cdot) denotes a performance metric (e.g., mean squared error) under distributions P0P_0 and P0,−jP_{0,-j} (Zheng et al., 19 Aug 2025). The empirical estimator is

ψ^0,jloco=1n∑i=1n{[Yi−fn(Xi)]2−[Yi−fn,−j(Xi,−j)]2}\hat{\psi}_{0,j}^{\text{loco}} = \frac{1}{n} \sum_{i=1}^n \left\{[Y_i - f_n(X_i)]^2 - [Y_i - f_{n,-j}(X_{i,-j})]^2\right\}

where fnf_n and ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})0 are the fitted models on all features and the subset excluding ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})1, respectively.

In high-dimensional regularization contexts (e.g., LASSO), LOCO estimation quantifies the difference in penalized coefficient paths when a feature is omitted. If ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})2 is the LASSO solution at regularization parameter ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})3, the LOCO path statistic for the ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})4th feature is

ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})5

aggregating the discrepancy across the path (Cao et al., 2020).

2. LOCO in High-Dimensional Regression and Prediction Interval Construction

Prediction intervals with uniform asymptotic validity in high-dimensional settings are constructed via LOCO leave-one-out residuals. For observations ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})6, a point prediction for a new ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})7 is ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})8. The conditional distribution of ψ0,jloco=V(f0,P0)−V(f0,−j,P0,−j)\psi_{0,j}^{\text{loco}} = V(f_0, P_0) - V(f_{0,-j}, P_{0,-j})9 is approximated using leave-one-out residuals:

f0f_00

with f0f_01 the estimator excluding the f0f_02th data point. Empirical quantiles of f0f_03 define the prediction interval:

f0f_04

This interval is shown to achieve asymptotic nominal coverage for a broad class of estimators, including least squares, robust M-estimators, shrinkage methods, and penalized procedures like LASSO—even when f0f_05 or f0f_06 (Steinberger et al., 2016).

Key conditions:

  • Invariance to data ordering, ensuring residual exchangeability
  • Scaled estimation error converges to a constant limit
  • Influence of any observation on f0f_07 is asymptotically negligible

The LOCO interval adapts its length to estimator performance via a parameter f0f_08 quantifying error magnitude; more accurate predictors yield shorter intervals.

3. LOCO in Covariate Adjustment for Experiments

LOCO motivates estimator designs for randomized trials aiming to adjust for baseline covariate imbalances:

  • LOOP Estimator (Leave-One-Out Potential outcomes): For each unit, exclude it from the imputation dataset and predict its treatment/control outcomes using flexible algorithms (e.g., random forests). The individual treatment effect estimator is

f0f_09

with f0,−jf_{0,-j}0 constructed via leave-one-out prediction, and f0,−jf_{0,-j}1 the signed inverse probability weight. The average effect is f0,−jf_{0,-j}2. This design-based estimator is exactly unbiased under the Neyman–Rubin potential outcomes framework and enhances precision via automatic variable selection when using machine learning methods. Variance formulas quantify gains over standard difference estimators (Wu et al., 2017).

  • P-LOOP Estimator: Extends LOCO to paired experiments using leave-one-pair-out imputations, balancing precision with respect to pair assignments—avoiding overadjustment and guarding against variance increases from modeling the pair structure unnecessarily (Wu et al., 2019).

4. LOCO in Conditional Independence and Hypothesis Testing

Conditional randomization tests and variable importance assessment are generalized via leave-one-covariate-out approaches:

  • LOCO Conditional Randomization Test (LOCO CRT): For each variable, construct a test statistic by comparing predictive performance with and without the variable, generating reference distributions by randomizing the left-out covariate. Valid p-values for individual features can be computed using the proportion of randomized statistics exceeding the observed one:

f0,−jf_{0,-j}3

Familywise error rate is controlled directly. For L1-regularized M-estimators, the L1ME CRT variant leverages the stability of cross-validated lasso selections for computational speed. In multivariate Gaussian designs, closed-form p-values eliminate resampling (Katsevich et al., 2020).

  • LOCO in High-Dimensional Penalized Inference: LOCO measures the impact of individual variables on the whole regularization path, allowing for simultaneous hypothesis testing and variable screening with robust bootstrap calibrations (Cao et al., 2020).

5. LOCO for Feature Importance, Interaction, and Efficiency Comparisons

Feature importance estimation in model-agnostic and black-box models relies on LOCO metrics:

  • LOCO is defined as the performance change (e.g., increase in MSE) when a feature is omitted—a nonparametric analog of f0,−jf_{0,-j}4 (Verdinelli et al., 2021). Decorrelated LOCO variants address interpretation difficulties due to covariate correlation by targeting parameters where covariates are rendered independent either by reweighting or semiparametric projection (e.g., f0,−jf_{0,-j}5).
  • Interaction LOCO (iLOCO): Quantifies the effect of pairwise (or higher-order) feature interactions via the difference in LOCO statistics:

f0,−jf_{0,-j}6

where f0,−jf_{0,-j}7 measures error increase when f0,−jf_{0,-j}8 is removed and f0,−jf_{0,-j}9 when both are removed. Ensemble minipatch methods efficiently compute iLOCO and corresponding confidence intervals in large datasets (Little et al., 10 Feb 2025).

  • Comparisons with Regression-Based Measures (Generalized Covariance Measure [GCM]): LOCO requires retraining for each feature omitted and produces easily interpretable importance metrics. GCM uses residual covariance between features and outcomes given the rest, enjoying efficiency advantages (lower coefficient of variation) in linear, additive, and single-index models. For linear regression:

XjX_j0

XjX_j1

where XjX_j2 (Zheng et al., 19 Aug 2025).

6. LOCO in Information-Theoretic Generalization Bounds

LOCO estimation links to generalization theory via mutual information measures. Leave-one-out conditional mutual information (CMI) between loss vectors and held-out sample index controls generalization error:

XjX_j3

For interpolating algorithms under 0–1 loss, risk is bounded by

XjX_j4

and connections are made between leave-one-out error, conditional entropy, and risk (Haghifam et al., 2022). Applications include optimal generalization bounds for VC classes using the one-inclusion graph algorithm.

7. Practical Considerations, Limitations, and Extensions

LOCO methodologies are widespread in modern statistical analysis, applicable to regression, causal inference, experimental design, machine learning model interpretation, and feature selection. They are computationally intensive when retraining is needed for every feature omitted, motivating efficient variants—e.g., ensemble minipatch approaches, out-of-bag predictions, dropout approximations, and Lazy–VI in neural networks. Limitations include susceptibility to covariance structure (e.g., masking of correlated features) and efficiency losses relative to regression-based measures under certain regularity conditions.

Extensions include distribution-free inference for feature interactions (iLOCO), decorrelation techniques for variable importance, familywise error rate control in conditional independence testing, and robust predictive interval estimation in high-dimensional settings.

LOCO remains a central approach for model-agnostic assessment of variable contribution, predictive uncertainty quantification, and interpretable machine learning—continually advanced by theoretical results and scalable algorithms in contemporary research.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Leave-One-Covariate-Out (LOCO) Estimation.