Predict-Then-Debias Techniques

Updated 4 July 2026

Predict-Then-Debias is a two-stage framework that separates initial biased prediction from an explicit debiasing step to correct spurious correlations.
It employs strategies such as failure-based weighting, residualization, adversarial adjustments, and moment corrections across varied applications.
Empirical results demonstrate improved unbiased accuracy and valid inference in tasks like classifier debiasing and high-dimensional estimation.

Predict-Then-Debias denotes a family of two-stage workflows in which a first-stage predictor, proxy, or intentionally biased model is constructed, and a second stage removes or attenuates the bias induced by spurious correlations, nuisance structure, measurement error, or prediction error. The label has been used across several research programs rather than for a single canonical algorithm. In representation learning and dataset-bias mitigation, it refers to training a biased classifier and then using its failures to train a debiased classifier (Nam et al., 2020). In semiparametric and high-dimensional inference, it refers to learning nuisance functions or external predictions first and then debiasing the downstream estimator by orthogonal moments, residualization, bootstrap correction, or debiased regularization (Xu et al., 2020, Sanford et al., 17 Feb 2025, Salerno et al., 12 Jul 2025, Kluger et al., 30 Jan 2025, Zhang et al., 14 Jun 2026). This suggests a unifying template: prediction is treated as an intermediate object, not as the final estimand.

1. Conceptual scope and recurring architecture

Across the literature, Predict-Then-Debias is organized around a division between a prediction stage and a correction stage. The prediction stage may produce a biased classifier, a nuisance-function approximation, a machine-learned outcome proxy, imputed covariates, or an external labeler. The debiasing stage then modifies optimization, moment conditions, or inferential formulas so that the final predictor or estimator is less sensitive to the bias carried by the first stage.

The recurring distinction is between predictive performance and downstream validity. In classifier debiasing, the objective is robustness to dataset bias without explicit bias labels (Nam et al., 2020). In semiparametric and post-prediction inference, the objective is valid estimation of a target parameter such as $\beta$ , a regression slope, or a general functional $\theta=\phi(\mathbb P_X)$ , even when the available predictions are noisy or differentially biased (Xu et al., 2020, Sanford et al., 17 Feb 2025, Kluger et al., 30 Jan 2025).

Framework	Predict stage	Debias stage
Learning from Failure	Train biased network $f_b(x;\theta_b)$ with GCE	Train debiased network $f_d(x;\theta_d)$ with weighted CE
DebiNet	Fit wide ReLU network to $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$	Residualize and run OLS or cross-fitting
Adversarial debiasing	Fit $\hat y=f(x;\theta)$	Penalize information that residuals $\nu$ carry about $z$
Moment-based post-prediction inference	Fit black-box $\hat Y=f(\boldsymbol Z)$ and calibration model	Use moment-corrected estimator with $c=N/n$
PTD with imputed covariates	Use proxy covariates $\theta=\phi(\mathbb P_X)$ 0 on a large incomplete sample	Correct by $\theta=\phi(\mathbb P_X)$ 1 and bootstrap
DEAL	Use external predictor $\theta=\phi(\mathbb P_X)$ 2 or $\theta=\phi(\mathbb P_X)$ 3	Bias-aware shrinkage, stacked Lasso refit, final debiasing

A central implication of these formulations is that the first-stage model is often allowed to be imperfect. Some methods exploit its imperfection directly: Learning from Failure amplifies the prejudice of a biased network so that its failures reveal bias-conflicting samples (Nam et al., 2020), while adversarial post-prediction regression explicitly models the bias induced by residuals correlated with regressors (Sanford et al., 17 Feb 2025).

2. Failure-based classifier debiasing

In "Learning from Failure," Predict-Then-Debias is implemented as a two-network architecture with identical backbones, such as an MLP for Colored MNIST or ResNet-20/18 for CIFAR-10, CelebA, and BAR (Nam et al., 2020). The biased network $\theta=\phi(\mathbb P_X)$ 4 is encouraged to learn spurious correlations, while the debiased network $\theta=\phi(\mathbb P_X)$ 5 is trained to focus on samples that the biased network finds difficult.

The biased-network loss is the Generalized Cross-Entropy loss

$\theta=\phi(\mathbb P_X)$ 6

with gradient

$\theta=\phi(\mathbb P_X)$ 7

Because this gradient up-weights easy examples with high $\theta=\phi(\mathbb P_X)$ 8, the biased network quickly memorizes bias-aligned samples. The debiased network uses weighted standard cross-entropy with per-example weight

$\theta=\phi(\mathbb P_X)$ 9

so that examples easy for $f_b(x;\theta_b)$ 0 receive low weight and examples difficult for $f_b(x;\theta_b)$ 1 receive weight near $f_b(x;\theta_b)$ 2.

The theoretical motivation is an empirical observation about malignant bias: when the bias attribute is easier than the target attribute, a network trained with standard cross-entropy learns bias-aligned samples early and only later fits bias-conflicting samples. By amplifying that easy bias pattern in $f_b(x;\theta_b)$ 3 via GCE, the method exaggerates failures on bias-conflicting examples; $f_b(x;\theta_b)$ 4 then focuses on those failures. The paper explicitly connects this behavior to the "small-loss first" learning dynamics in deep nets (Nam et al., 2020).

The joint training algorithm samples a minibatch, updates the biased network with GCE, optionally maintains an exponential moving average of each cross-entropy for stable $f_b(x;\theta_b)$ 5 estimates, and updates the debiased network with $f_b(x;\theta_b)$ 6. Typical hyperparameters are $f_b(x;\theta_b)$ 7, $f_b(x;\theta_b)$ 8 for small datasets, $f_b(x;\theta_b)$ 9 for large datasets, batch size $f_d(x;\theta_d)$ 0, optimizer Adam, and training epochs $f_d(x;\theta_d)$ 1– $f_d(x;\theta_d)$ 2 for synthetics and $f_d(x;\theta_d)$ 3– $f_d(x;\theta_d)$ 4 for real-world benchmarks (Nam et al., 2020).

The empirical protocol uses both controlled and real-world datasets. On Colored MNIST at $f_d(x;\theta_d)$ 5, vanilla unbiased accuracy increases from $f_d(x;\theta_d)$ 6 to $f_d(x;\theta_d)$ 7, and bias-conflicting accuracy from $f_d(x;\theta_d)$ 8 to $f_d(x;\theta_d)$ 9. On Corrupted CIFAR-10 $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 0, vanilla unbiased accuracy increases from $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 1 to $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 2, and bias-conflicting accuracy from $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 3 to $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 4. On CelebA HairColor, vanilla accuracy increases from $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 5 to $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 6, with bias-conflicting accuracy increasing from $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 7 to $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 8. On BAR, average evaluation accuracy increases from $(\mathbb E[Y\mid Z],\mathbb E[D\mid Z])$ 9 to $\hat y=f(x;\theta)$ 0, compared with $\hat y=f(x;\theta)$ 1 for ReBias. The reported summary is that LfF significantly improves robustness to dataset bias without explicit bias labels or architecture changes, and in some cases outperforms debiasing methods that require explicit supervision of spuriously correlated attributes (Nam et al., 2020).

3. Semiparametric residualization and DebiNet

DebiNet uses Predict-Then-Debias in a semiparametric partially linear model,

$\hat y=f(x;\theta)$ 2

where $\hat y=f(x;\theta)$ 3 is the low-dimensional target parameter and $\hat y=f(x;\theta)$ 4 is a nuisance function (Xu et al., 2020). The prediction stage is the joint approximation of $\hat y=f(x;\theta)$ 5 and $\hat y=f(x;\theta)$ 6, written as

$\hat y=f(x;\theta)$ 7

To estimate $\hat y=f(x;\theta)$ 8, DebiNet fits a wide two-layer ReLU network

$\hat y=f(x;\theta)$ 9

with training objective

$\nu$ 0

Under NTK-type over-parameterization conditions, with width $\nu$ 1, gradient descent on $\nu$ 2 with random fixed $\nu$ 3 converges exponentially fast to zero training loss. The paper notes that in practice one may train both layers jointly and use Adam or SGD, but the theory is easiest with lazy first-layer training (Xu et al., 2020).

The debiasing stage is residualization followed by orthogonal-moment estimation. With

$\nu$ 4

the residuals are

$\nu$ 5

The orthogonal moment leads to

$\nu$ 6

which is equivalent to OLS of $\nu$ 7 on $\nu$ 8, giving

$\nu$ 9

Cross-fitting is optional: partition the data into $z$ 0 folds, train on all data except fold $z$ 1, predict on fold $z$ 2, form residuals there, compute $z$ 3, and average (Xu et al., 2020).

The theoretical guarantees state that if $z$ 4, then $z$ 5 is $z$ 6-consistent,

$z$ 7

with asymptotic variance

$z$ 8

and a Wald-type $z$ 9 confidence interval based on

$\hat Y=f(\boldsymbol Z)$ 0

Empirically, synthetic PLM experiments compare PLM-NN with NW-kernel PLM, DML-Lasso, DML-RF, and related alternatives using estimation MSE $\hat Y=f(\boldsymbol Z)$ 1 and train/test MSE; the summary is that PLM-NN matches or beats alternatives, with far fewer nuisance-fits. In high-dimensional linear data, DebiNet is reported to attain the lowest MSE, valid $\hat Y=f(\boldsymbol Z)$ 2 coverage even when debiased-Lasso fails, and fast runtime. On the 401(k) treatment-effect example, PLM-NN gives $\hat Y=f(\boldsymbol Z)$ 3 with $\hat Y=f(\boldsymbol Z)$ 4, versus kernels $\hat Y=f(\boldsymbol Z)$ 5 with $\hat Y=f(\boldsymbol Z)$ 6 and DML-Lasso $\hat Y=f(\boldsymbol Z)$ 7 with $\hat Y=f(\boldsymbol Z)$ 8 (Xu et al., 2020).

4. Post-prediction inference with predicted outcomes

A different Predict-Then-Debias line studies downstream regression when the dependent variable is a machine-learned prediction. The basic workflow is explicit: first fit an ML predictor $\hat Y=f(\boldsymbol Z)$ 9, then plug $c=N/n$ 0 into an OLS regression on covariates $c=N/n$ 1 (Sanford et al., 17 Feb 2025). If the true model is

$c=N/n$ 2

and the prediction error is $c=N/n$ 3, then OLS on $c=N/n$ 4 yields

$c=N/n$ 5

The only bias term is

$c=N/n$ 6

or, in the scalar case,

$c=N/n$ 7

The operative issue is differential error, $c=N/n$ 8, rather than merely low predictive accuracy (Sanford et al., 17 Feb 2025).

The adversarial debiasing response is to learn $c=N/n$ 9 so that residuals $\theta=\phi(\mathbb P_X)$ 00 carry no linear information about $\theta=\phi(\mathbb P_X)$ 01. The min-max objective is

$\theta=\phi(\mathbb P_X)$ 02

For a linear adversary, $\theta=\phi(\mathbb P_X)$ 03, with

$\theta=\phi(\mathbb P_X)$ 04

Training alternates between updating $\theta=\phi(\mathbb P_X)$ 05 and updating $\theta=\phi(\mathbb P_X)$ 06 using $\theta=\phi(\mathbb P_X)$ 07. The paper states that as $\theta=\phi(\mathbb P_X)$ 08, if prediction accuracy is held constant, $\theta=\phi(\mathbb P_X)$ 09, so the bias term in $\theta=\phi(\mathbb P_X)$ 10 shrinks. A diagnostic measurement-error test regresses $\theta=\phi(\mathbb P_X)$ 11 on $\theta=\phi(\mathbb P_X)$ 12 in a small labeled sample and tests $\theta=\phi(\mathbb P_X)$ 13 using standard or bootstrapped standard errors (Sanford et al., 17 Feb 2025).

The reported simulations use $\theta=\phi(\mathbb P_X)$ 14 points and labeled sample sizes $\theta=\phi(\mathbb P_X)$ 15 from $\theta=\phi(\mathbb P_X)$ 16 to $\theta=\phi(\mathbb P_X)$ 17. Baseline ML predictions yield a large negative bias in $\theta=\phi(\mathbb P_X)$ 18, while both bias-correction and adversarial methods recover $\theta=\phi(\mathbb P_X)$ 19 on average. With true bias $\theta=\phi(\mathbb P_X)$ 20, about $\theta=\phi(\mathbb P_X)$ 21 labels are needed to detect bias at $\theta=\phi(\mathbb P_X)$ 22 power. In the West Africa road-and-forest-cover case study, the true slope is approximately $\theta=\phi(\mathbb P_X)$ 23, the baseline overestimates the road effect at approximately $\theta=\phi(\mathbb P_X)$ 24, and adversarial and bias-correction nearly recover the true slope with valid standard errors. Hyperparameter tuning shows that $\theta=\phi(\mathbb P_X)$ 25 suffices, and the DNN primary model can gain slight accuracy improvement because the adversary acts as a regularizer (Sanford et al., 17 Feb 2025).

A related contribution generalizes post-prediction inference with a moment correction. The setup uses a labeled sample

$\theta=\phi(\mathbb P_X)$ 26

and an unlabeled sample

$\theta=\phi(\mathbb P_X)$ 27

with $\theta=\phi(\mathbb P_X)$ 28 and downstream target

$\theta=\phi(\mathbb P_X)$ 29

Naive regression of $\theta=\phi(\mathbb P_X)$ 30 on $\theta=\phi(\mathbb P_X)$ 31 is biased whenever

$\theta=\phi(\mathbb P_X)$ 32

satisfies $\theta=\phi(\mathbb P_X)$ 33 (Salerno et al., 12 Jul 2025).

Wang et al. (2020) are reviewed through the calibration model

$\theta=\phi(\mathbb P_X)$ 34

with key assumption $\theta=\phi(\mathbb P_X)$ 35. The moment-based extension relaxes that assumption and uses

$\theta=\phi(\mathbb P_X)$ 36

where $\theta=\phi(\mathbb P_X)$ 37 is estimated in the labeled sample and $\theta=\phi(\mathbb P_X)$ 38, $\theta=\phi(\mathbb P_X)$ 39 are estimated in the unlabeled sample. To preserve calibration variability when $\theta=\phi(\mathbb P_X)$ 40 is large, the method introduces the scaling factor

$\theta=\phi(\mathbb P_X)$ 41

so that

$\theta=\phi(\mathbb P_X)$ 42

Under i.i.d. sampling, correct model specification, and moment consistency, the estimator is unbiased; by a multivariate CLT and the delta method, it is asymptotically normal and yields asymptotically correct nominal Type I error and coverage. The simulation summary is that the estimator is unbiased, controls Type I error at $\theta=\phi(\mathbb P_X)$ 43, and achieves near-nominal coverage in all settings, while naive regression and original PostPI fail when $\theta=\phi(\mathbb P_X)$ 44 or when $\theta=\phi(\mathbb P_X)$ 45 (Salerno et al., 12 Jul 2025).

5. Imputed covariates, complex sampling, and external-model-assisted high-dimensional regression

Another Predict-Then-Debias formulation treats machine learning as an imputation device for missing covariates rather than for outcomes. In the two-phase sampling setup, the true covariates are $\theta=\phi(\mathbb P_X)$ 46, proxy covariates are $\theta=\phi(\mathbb P_X)$ 47, and the target is a $\theta=\phi(\mathbb P_X)$ 48-dimensional functional

$\theta=\phi(\mathbb P_X)$ 49

A label indicator $\theta=\phi(\mathbb P_X)$ 50 induces weights

$\theta=\phi(\mathbb P_X)$ 51

where $\theta=\phi(\mathbb P_X)$ 52 is known and bounded away from $\theta=\phi(\mathbb P_X)$ 53 and $\theta=\phi(\mathbb P_X)$ 54 (Kluger et al., 30 Jan 2025).

Given a black-box routine $\theta=\phi(\mathbb P_X)$ 55, the paper defines

$\theta=\phi(\mathbb P_X)$ 56

and the PTD estimator

$\theta=\phi(\mathbb P_X)$ 57

A more general version uses a tuning matrix $\theta=\phi(\mathbb P_X)$ 58,

$\theta=\phi(\mathbb P_X)$ 59

with asymptotic normality

$\theta=\phi(\mathbb P_X)$ 60

The optimal tuning matrix is

$\theta=\phi(\mathbb P_X)$ 61

In the uniform subsampling case, the variance identity

$\theta=\phi(\mathbb P_X)$ 62

shows that PTD is always at least as efficient as the classical estimator (Kluger et al., 30 Jan 2025).

Confidence intervals are produced by a percentile bootstrap that resamples weighted two-phase data and recomputes $\theta=\phi(\mathbb P_X)$ 63, $\theta=\phi(\mathbb P_X)$ 64, and $\theta=\phi(\mathbb P_X)$ 65, or by a faster convolution bootstrap. Stratified and cluster-bootstrap variants are also given. The key theoretical statements are asymptotic normality, PTD no less efficient than the classical estimator, bootstrap consistency, validity independent of ML quality, and confidence intervals no wider than those that ignore the proxy. Real-data examples include remote-sensing of housing prices, tree-cover regression with clustered sampling by $\theta=\phi(\mathbb P_X)$ 66 grid, and census disability with stratified sampling by age group (Kluger et al., 30 Jan 2025).

In high-dimensional semi-supervised linear regression, DEAL extends the same logic to an external predictor $\theta=\phi(\mathbb P_X)$ 67 or $\theta=\phi(\mathbb P_X)$ 68. The target model is

$\theta=\phi(\mathbb P_X)$ 69

with abundant unlabeled covariates $\theta=\phi(\mathbb P_X)$ 70 and sparse $\theta=\phi(\mathbb P_X)$ 71 (Zhang et al., 14 Jun 2026). A central observation is the rectifier-cancellation proposition: if $\theta=\phi(\mathbb P_X)$ 72, then

$\theta=\phi(\mathbb P_X)$ 73

The paper concludes that PPI and PPI++ cannot improve on labeled-only OLS, or its high-dimensional analog, when the labeler is linear or nearly oracle.

DEAL avoids mean-based rectification and instead routes the external estimator and the unlabeled covariates into the variance of a debiased estimator. Its four stages are a bias-aware initializer, pseudo-label imputation, stacked Lasso refit, and final debiasing. The initializer is

$\theta=\phi(\mathbb P_X)$ 74

where $\theta=\phi(\mathbb P_X)$ 75 and $\theta=\phi(\mathbb P_X)$ 76 is a cross-fitted shrinkage estimate. The final debiased estimator is

$\theta=\phi(\mathbb P_X)$ 77

with coordinatewise confidence intervals

$\theta=\phi(\mathbb P_X)$ 78

Under the paper’s assumptions, $\theta=\phi(\mathbb P_X)$ 79. Under misspecification, validity extends to the best-linear projection $\theta=\phi(\mathbb P_X)$ 80. The interval-length comparison states that, at the same $\theta=\phi(\mathbb P_X)$ 81, DEAL intervals are shorter than those of debiased Lasso, PPI, and PPI++; in simulations, DEAL median confidence-interval length is $\theta=\phi(\mathbb P_X)$ 82– $\theta=\phi(\mathbb P_X)$ 83 of debiased Lasso, and in six real-data applications the median ratio is $\theta=\phi(\mathbb P_X)$ 84– $\theta=\phi(\mathbb P_X)$ 85. A shift-aware variant restores unbiasedness and the CLT under covariate shift (Zhang et al., 14 Jun 2026).

6. Common structure, limitations, and adjacent post-processing methods

Several misconceptions are corrected by this literature. One is that a highly accurate upstream predictor can be treated as ground truth in downstream analysis. Papers on adversarial debiasing, moment-based post-prediction inference, and PTD with imputed covariates all show that naive use of predicted data can produce biased coefficients, invalid standard errors, or incorrect coverage when residual prediction error is correlated with downstream covariates or when calibration uncertainty is ignored (Sanford et al., 17 Feb 2025, Salerno et al., 12 Jul 2025, Kluger et al., 30 Jan 2025). Another is that debiasing necessarily requires explicit labels for the bias attribute: Learning from Failure improves robustness without explicit bias labels by exploiting training dynamics instead (Nam et al., 2020).

The methods also expose distinct bias-variance trade-offs. In Learning from Failure, the debiasing mechanism depends on early-phase preference for easy, bias-aligned samples and on amplifying that preference with GCE (Nam et al., 2020). In adversarial post-prediction regression, if $\theta=\phi(\mathbb P_X)$ 86 is too small there is no debiasing, while if $\theta=\phi(\mathbb P_X)$ 87 is too large the model may randomize $\theta=\phi(\mathbb P_X)$ 88 to thwart the adversary at the expense of accuracy; the algorithmic stability issue is attributed to the non-convexity of adversarial loss (Sanford et al., 17 Feb 2025). In moment-based post-prediction inference, larger labeled-sample size $\theta=\phi(\mathbb P_X)$ 89 reduces the variance term $\theta=\phi(\mathbb P_X)$ 90, and the recommendation is that $\theta=\phi(\mathbb P_X)$ 91 be at least a few times $\theta=\phi(\mathbb P_X)$ 92 so that the relationship model and moment estimates are stable (Salerno et al., 12 Jul 2025). In PTD with imputed covariates, efficiency gains depend on $\theta=\phi(\mathbb P_X)$ 93, but validity does not require $\theta=\phi(\mathbb P_X)$ 94 (Kluger et al., 30 Jan 2025).

A related but distinct post-processing line is TowerDebias, which addresses unfairness in black-box predictions rather than regression bias from predicted data. It uses the Tower Property

$\theta=\phi(\mathbb P_X)$ 95

and replaces $\theta=\phi(\mathbb P_X)$ 96 by

$\theta=\phi(\mathbb P_X)$ 97

The paper states an informal theorem that

$\theta=\phi(\mathbb P_X)$ 98

Empirically, tDB cuts $\theta=\phi(\mathbb P_X)$ 99 or $f_b(x;\theta_b)$ 00 by $f_b(x;\theta_b)$ 01– $f_b(x;\theta_b)$ 02 at modest $f_b(x;\theta_b)$ 03, often around $f_b(x;\theta_b)$ 04, with utility loss typically below a $f_b(x;\theta_b)$ 05 rise in MAPE or below a $f_b(x;\theta_b)$ 06 relative increase in misclassification rate (Matloff et al., 2024). This is not framed as Predict-Then-Debias in the same inferential sense, but it belongs to the same post-hoc debiasing family in which a black-box model is left intact and the correction is applied to its outputs.

Taken together, the literature uses Predict-Then-Debias to denote a class of procedures in which debiasing is downstream of prediction rather than upstream of model design. The technical realization varies—failure-based weighting, orthogonal moments, adversarial residual decorrelation, bootstrap correction, or bias-aware debiasing of regularized estimators—but the common principle is stable: prediction is used as a first-stage surrogate, and the inferential or fairness target is recovered only after an explicit debiasing step (Nam et al., 2020, Xu et al., 2020, Sanford et al., 17 Feb 2025, Salerno et al., 12 Jul 2025, Kluger et al., 30 Jan 2025, Zhang et al., 14 Jun 2026).