Conditional Cross-Fitting

Updated 27 December 2025

Conditional cross-fitting is an estimation technique that splits data into folds to independently fit nuisance functions, reducing bias in treatment effect estimators.
It iteratively applies K-fold or three-way splits to decouple model fitting from evaluation, thereby improving finite-sample performance and achieving semiparametric efficiency.
This method ensures unbiased estimation and valid variance computation in randomized designs, making it crucial for robust causal inference.

Conditional cross-fitting is a statistical methodology for constructing estimators of conditional and average treatment effects, as well as more general linear functionals, using machine learning or flexible nonparametric techniques. It combines careful sample-splitting with out-of-sample nuisance function estimation to break the dependence between fitting and evaluation stages, thereby improving both bias and variance properties—often achieving semiparametric efficiency bounds and robust performance in finite samples. Conditional cross-fitting has become central in modern causal inference, semiparametric estimation, and machine-learning-assisted inference under both model-based and design-based frameworks (Jacob, 2020, Newey et al., 2018, Lu et al., 21 Aug 2025, Fisher et al., 2023).

1. Definitions and Conceptual Foundations

Conditional cross-fitting refers to a class of estimators in which the dataset is partitioned into mutually exclusive subsets (folds) and nuisance functions (such as propensity scores, outcome regressions, or other conditional expectations) required for estimation are fitted on one fold and evaluated on a different, non-overlapping fold. This ensures that, for each observation, the prediction or nuisance function is constructed independently of that observation—removing the “own-observation” bias that arises from double use of data. In doubly robust or influence function-based estimators, cross-fitting also breaks dependence between multiple nonparametric nuisance estimators, facilitating stochastic equicontinuity decompositions and controlling higher-order bias (Newey et al., 2018, Fisher et al., 2023).

Conditional cross-fitting is distinct from classical sample splitting in its iterative use of K-fold or multi-way splits (achieving higher efficiency and reduced finite-sample variance), as well as its explicit conditioning on the data-generating design in randomized experiments, under which only the assignment is random and all covariates/potential outcomes are fixed (Lu et al., 21 Aug 2025).

2. Algorithms and Practical Workflows

K-fold cross-fitting is the prototypical workflow for constructing cross-fitted estimators of treatment effect and semiparametric functionals (Jacob, 2020). The algorithm proceeds as follows:

Randomly split the dataset of size $N$ into $K$ disjoint folds $S_1, \ldots, S_K$ .
For each fold $k$ $k$ :
- Define auxiliary/training data $A_k = \bigcup_{j \neq k} S_j$ and main/estimation sample $M_k = S_k$ .
- Fit all required nuisance functions (e.g., propensity score, outcome regression) on $A_k$ ; denote estimates by $\hat n_{-k}$ .
- Compute pseudo-outcomes $\hat\psi_i$ on $M_k$ using $\hat n_{-k}$ .
- Regress $\hat\psi_i$ on covariates within $M_k$ to obtain fold-specific conditional average treatment effect estimate $\hat\tau_k(x)$ .
Aggregate across folds: $\tilde\tau_K(x) = \frac{1}{K}\sum_{k=1}^K \hat\tau_k(x)$ .

This framework is general and applies to several meta-learners: T-learner, R-learner, DR-learner, and X-learner. Three-way cross-fitting further generalizes this by partitioning the sample into three folds, with separate estimation of each nuisance component and bias correction on held-out portions, optimally reducing first-order bias in doubly robust settings (Fisher et al., 2023, Newey et al., 2018).

In design-based inference typical of randomized experiments, conditional cross-fitting explicitly ensures that sample splits and estimator construction respect the randomization mechanism. Sample-splitting algorithms are tailored to Bernoulli, completely randomized, stratified, and matched-pairs designs, guaranteeing finite-sample unbiasedness of estimator contributions and correct variance estimation (Lu et al., 21 Aug 2025).

3. Theoretical Guarantees and Remainder Rates

Conditional cross-fitting achieves several key statistical guarantees:

Removal of own-observation bias: Plug-in estimators using cross-fitted nuisance estimators evaluated out-of-fold achieve bias rates nearly as fast as possible under classical smoothness and complexity conditions (Newey et al., 2018).
Minimization of higher-order (nonlinearity) bias: Double or three-way cross-fitting can render leading bias terms second-order—crucial for root- $n$ consistency and valid asymptotic normality.
Achieving semiparametric efficiency: Under standard regularity (unconfoundedness, overlap, bounded complexity, smoothness), cross-fit doubly robust estimators attain the semiparametric efficiency bound for variance (Newey et al., 2018, Fisher et al., 2023).
Unbiasedness under design-based inference: For randomized experiments viewed as permutations of fixed potential outcomes/covariates, conditional cross-fitting yields exactly unbiased estimators even in finite samples (Lu et al., 21 Aug 2025).

The theoretical rates, for spline-based series estimators, take the form:

$A^* := \sqrt{n}\,K^{-(s_y+s_a)/r} + K^{-s_y/r} + K^{-s_a/r} + \frac{K}{n}$

and under appropriate conditions, the remainder terms in the asymptotic expansion are $o_p(1/\sqrt{n})$ or $O_p(A^*)$ (Newey et al., 2018).

4. Monte Carlo and Empirical Performance

Empirical studies demonstrate the practical benefits of conditional cross-fitting in heterogeneous treatment effect estimation:

For DR- and R-learners across multiple simulated data generating processes, the combination of 5-fold cross-fitting with median aggregation over 20 or more random splits consistently yields the lowest mean squared error, with reductions by 30–50% compared to naive or 2-fold estimators—even more pronounced under covariate-dependent treatment assignment (Jacob, 2020).
Exclusion of “sharp” learners such as Lasso in small samples further stabilizes estimation, reducing the prevalence of outliers.
The X-learner is an exception; in many scenarios its naive estimator matches or surpasses cross-fit alternatives.
For randomized experimental designs, cross-fitted covariate adjustment retains unbiasedness while delivering valid variance estimation and valid inference, regardless of misspecification or highly complex ML predictors (Lu et al., 21 Aug 2025).

A summary table of empirical findings appears in the referenced work:

Meta-learner	Naive MSE (N=2000, RCT, linear $\tau$ )	5-fold Cross-Fit Median MSE
DR-learner	0.72	0.36
R-learner	0.94	0.34

5. Extensions and Specialized Contexts

Conditional cross-fitting extends naturally to:

Three-way cross-fitting: Splits the sample into three disjoint parts, each used exclusively for estimation of a single nuisance component or bias correction, thereby removing dependence and first-order bias in the estimation of CATE, with fast rates under smoothness and overlap assumptions (Fisher et al., 2023).
Design-based inference: CCF generalizes cross-fitting to frameworks where only treatment assignment is random. Sample splits are constructed to respect assignment mechanism—enabling unbiased machine-learning adjustment even when the i.i.d. assumption fails (Lu et al., 21 Aug 2025).
General semiparametric functionals: The approach is not limited to treatment effect estimation but encompasses a wide class of linear functionals, such as expected conditional covariance, mean estimation with missing data, and weighted average derivatives (Newey et al., 2018).

6. Practical Recommendations and Implementation

Choice of K: Use $K\approx 5$ for the split; this balances the need for large training and evaluation folds.
Aggregation: Repeat the entire procedure $B\approx 20$ –$50$ times with independent splits, aggregating results with the median rather than the mean to protect against outliers (Jacob, 2020).
ML choices: Flexible machine learning regressors can be used for nuisance estimation (random forests, boosting, neural nets, spline-based models); the method is robust to mis-specification as the splitting removes estimator-induced bias (Lu et al., 21 Aug 2025).
Variance estimation: Use out-of-fold residuals and design-aware formulas for variance, ensuring conservative, valid inference (Lu et al., 21 Aug 2025).
Diagnostics and tuning: Vary $K$ and $B$ in pilot runs; exclude highly variable learners if small sample outliers are present; cross-validate within each split (Jacob, 2020, Fisher et al., 2023).

7. Assumptions, Limitations, and Open Considerations

Conditional cross-fitting relies on key assumptions for its theoretical guarantees:

Proper sample-splitting, with each observation evaluated only on estimates fitted without itself.
Independence or conditional independence of sample splits (especially in randomized design-based settings).
Boundedness and overlap for regression and causal inference.
Sufficiently fast convergence of nuisance function estimators and “stability” of ML procedures.

Violations—including massive overfitting with small folds or “sharp” learners, degenerate randomization schemes, or insufficient variation in splits—can compromise performance. The efficiency bounds and remainder rates are sharp only under these regularity assumptions, and rare pathological behavior can persist, particularly in high-dimensional or highly adaptive ML regimes (Jacob, 2020, Newey et al., 2018, Lu et al., 21 Aug 2025).

Conditional cross-fitting constitutes a core methodological component in contemporary machine-learning-assisted causal inference and semiparametric statistics, enabling robust and theoretically optimal estimation in the presence of complex nuisance structures and under both classic and design-based inference paradigms (Jacob, 2020, Newey et al., 2018, Lu et al., 21 Aug 2025, Fisher et al., 2023).

Markdown Upgrade to Chat

References (4)

Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects (2020)

Cross-Fitting and Fast Remainder Rates for Semiparametric Estimation (2018)

Conditional cross-fitting for unbiased machine-learning-assisted covariate adjustment in randomized experiments (2025)

Three-way Cross-Fitting and Pseudo-Outcome Regression for Estimation of Conditional Effects and other Linear Functionals (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Cross-Fitting.

Conditional Cross-Fitting

1. Definitions and Conceptual Foundations

2. Algorithms and Practical Workflows

3. Theoretical Guarantees and Remainder Rates

4. Monte Carlo and Empirical Performance

5. Extensions and Specialized Contexts

6. Practical Recommendations and Implementation

7. Assumptions, Limitations, and Open Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Conditional Cross-Fitting

1. Definitions and Conceptual Foundations

2. Algorithms and Practical Workflows

3. Theoretical Guarantees and Remainder Rates

4. Monte Carlo and Empirical Performance

5. Extensions and Specialized Contexts

6. Practical Recommendations and Implementation

7. Assumptions, Limitations, and Open Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research