Conditional Cross-Fitting
- Conditional cross-fitting is an estimation technique that splits data into folds to independently fit nuisance functions, reducing bias in treatment effect estimators.
- It iteratively applies K-fold or three-way splits to decouple model fitting from evaluation, thereby improving finite-sample performance and achieving semiparametric efficiency.
- This method ensures unbiased estimation and valid variance computation in randomized designs, making it crucial for robust causal inference.
Conditional cross-fitting is a statistical methodology for constructing estimators of conditional and average treatment effects, as well as more general linear functionals, using machine learning or flexible nonparametric techniques. It combines careful sample-splitting with out-of-sample nuisance function estimation to break the dependence between fitting and evaluation stages, thereby improving both bias and variance properties—often achieving semiparametric efficiency bounds and robust performance in finite samples. Conditional cross-fitting has become central in modern causal inference, semiparametric estimation, and machine-learning-assisted inference under both model-based and design-based frameworks (Jacob, 2020, Newey et al., 2018, Lu et al., 21 Aug 2025, Fisher et al., 2023).
1. Definitions and Conceptual Foundations
Conditional cross-fitting refers to a class of estimators in which the dataset is partitioned into mutually exclusive subsets (folds) and nuisance functions (such as propensity scores, outcome regressions, or other conditional expectations) required for estimation are fitted on one fold and evaluated on a different, non-overlapping fold. This ensures that, for each observation, the prediction or nuisance function is constructed independently of that observation—removing the “own-observation” bias that arises from double use of data. In doubly robust or influence function-based estimators, cross-fitting also breaks dependence between multiple nonparametric nuisance estimators, facilitating stochastic equicontinuity decompositions and controlling higher-order bias (Newey et al., 2018, Fisher et al., 2023).
Conditional cross-fitting is distinct from classical sample splitting in its iterative use of K-fold or multi-way splits (achieving higher efficiency and reduced finite-sample variance), as well as its explicit conditioning on the data-generating design in randomized experiments, under which only the assignment is random and all covariates/potential outcomes are fixed (Lu et al., 21 Aug 2025).
2. Algorithms and Practical Workflows
K-fold cross-fitting is the prototypical workflow for constructing cross-fitted estimators of treatment effect and semiparametric functionals (Jacob, 2020). The algorithm proceeds as follows:
- Randomly split the dataset of size into disjoint folds .
- For each fold :
- Define auxiliary/training data and main/estimation sample .
- Fit all required nuisance functions (e.g., propensity score, outcome regression) on ; denote estimates by .
- Compute pseudo-outcomes on using .
- Regress on covariates within to obtain fold-specific conditional average treatment effect estimate .
- Aggregate across folds: .
This framework is general and applies to several meta-learners: T-learner, R-learner, DR-learner, and X-learner. Three-way cross-fitting further generalizes this by partitioning the sample into three folds, with separate estimation of each nuisance component and bias correction on held-out portions, optimally reducing first-order bias in doubly robust settings (Fisher et al., 2023, Newey et al., 2018).
In design-based inference typical of randomized experiments, conditional cross-fitting explicitly ensures that sample splits and estimator construction respect the randomization mechanism. Sample-splitting algorithms are tailored to Bernoulli, completely randomized, stratified, and matched-pairs designs, guaranteeing finite-sample unbiasedness of estimator contributions and correct variance estimation (Lu et al., 21 Aug 2025).
3. Theoretical Guarantees and Remainder Rates
Conditional cross-fitting achieves several key statistical guarantees:
- Removal of own-observation bias: Plug-in estimators using cross-fitted nuisance estimators evaluated out-of-fold achieve bias rates nearly as fast as possible under classical smoothness and complexity conditions (Newey et al., 2018).
- Minimization of higher-order (nonlinearity) bias: Double or three-way cross-fitting can render leading bias terms second-order—crucial for root- consistency and valid asymptotic normality.
- Achieving semiparametric efficiency: Under standard regularity (unconfoundedness, overlap, bounded complexity, smoothness), cross-fit doubly robust estimators attain the semiparametric efficiency bound for variance (Newey et al., 2018, Fisher et al., 2023).
- Unbiasedness under design-based inference: For randomized experiments viewed as permutations of fixed potential outcomes/covariates, conditional cross-fitting yields exactly unbiased estimators even in finite samples (Lu et al., 21 Aug 2025).
The theoretical rates, for spline-based series estimators, take the form:
and under appropriate conditions, the remainder terms in the asymptotic expansion are or (Newey et al., 2018).
4. Monte Carlo and Empirical Performance
Empirical studies demonstrate the practical benefits of conditional cross-fitting in heterogeneous treatment effect estimation:
- For DR- and R-learners across multiple simulated data generating processes, the combination of 5-fold cross-fitting with median aggregation over 20 or more random splits consistently yields the lowest mean squared error, with reductions by 30–50% compared to naive or 2-fold estimators—even more pronounced under covariate-dependent treatment assignment (Jacob, 2020).
- Exclusion of “sharp” learners such as Lasso in small samples further stabilizes estimation, reducing the prevalence of outliers.
- The X-learner is an exception; in many scenarios its naive estimator matches or surpasses cross-fit alternatives.
- For randomized experimental designs, cross-fitted covariate adjustment retains unbiasedness while delivering valid variance estimation and valid inference, regardless of misspecification or highly complex ML predictors (Lu et al., 21 Aug 2025).
A summary table of empirical findings appears in the referenced work:
| Meta-learner | Naive MSE (N=2000, RCT, linear ) | 5-fold Cross-Fit Median MSE |
|---|---|---|
| DR-learner | 0.72 | 0.36 |
| R-learner | 0.94 | 0.34 |
5. Extensions and Specialized Contexts
Conditional cross-fitting extends naturally to:
- Three-way cross-fitting: Splits the sample into three disjoint parts, each used exclusively for estimation of a single nuisance component or bias correction, thereby removing dependence and first-order bias in the estimation of CATE, with fast rates under smoothness and overlap assumptions (Fisher et al., 2023).
- Design-based inference: CCF generalizes cross-fitting to frameworks where only treatment assignment is random. Sample splits are constructed to respect assignment mechanism—enabling unbiased machine-learning adjustment even when the i.i.d. assumption fails (Lu et al., 21 Aug 2025).
- General semiparametric functionals: The approach is not limited to treatment effect estimation but encompasses a wide class of linear functionals, such as expected conditional covariance, mean estimation with missing data, and weighted average derivatives (Newey et al., 2018).
6. Practical Recommendations and Implementation
- Choice of K: Use for the split; this balances the need for large training and evaluation folds.
- Aggregation: Repeat the entire procedure –$50$ times with independent splits, aggregating results with the median rather than the mean to protect against outliers (Jacob, 2020).
- ML choices: Flexible machine learning regressors can be used for nuisance estimation (random forests, boosting, neural nets, spline-based models); the method is robust to mis-specification as the splitting removes estimator-induced bias (Lu et al., 21 Aug 2025).
- Variance estimation: Use out-of-fold residuals and design-aware formulas for variance, ensuring conservative, valid inference (Lu et al., 21 Aug 2025).
- Diagnostics and tuning: Vary and in pilot runs; exclude highly variable learners if small sample outliers are present; cross-validate within each split (Jacob, 2020, Fisher et al., 2023).
7. Assumptions, Limitations, and Open Considerations
Conditional cross-fitting relies on key assumptions for its theoretical guarantees:
- Proper sample-splitting, with each observation evaluated only on estimates fitted without itself.
- Independence or conditional independence of sample splits (especially in randomized design-based settings).
- Boundedness and overlap for regression and causal inference.
- Sufficiently fast convergence of nuisance function estimators and “stability” of ML procedures.
Violations—including massive overfitting with small folds or “sharp” learners, degenerate randomization schemes, or insufficient variation in splits—can compromise performance. The efficiency bounds and remainder rates are sharp only under these regularity assumptions, and rare pathological behavior can persist, particularly in high-dimensional or highly adaptive ML regimes (Jacob, 2020, Newey et al., 2018, Lu et al., 21 Aug 2025).
Conditional cross-fitting constitutes a core methodological component in contemporary machine-learning-assisted causal inference and semiparametric statistics, enabling robust and theoretically optimal estimation in the presence of complex nuisance structures and under both classic and design-based inference paradigms (Jacob, 2020, Newey et al., 2018, Lu et al., 21 Aug 2025, Fisher et al., 2023).