Papers
Topics
Authors
Recent
Search
2000 character limit reached

Recalibrated Prediction Powered Inference (RePPI)

Updated 10 February 2026
  • RePPI is a family of statistical inference methods that integrates limited gold-standard labels with abundant ML predictions to correct bias and reduce variance.
  • It employs a learned recalibration mapping to adjust surrogate predictions, achieving both unbiased estimation and efficiency across frequentist and Bayesian approaches.
  • RePPI extends to M-estimation, risk-controlled prediction sets, and performative prediction, offering robust performance across diverse applications.

Recalibrated Prediction Powered Inference (RePPI) is a family of statistical inference methodologies that optimally combine a small labeled/gold-standard dataset and a large bank of ML predictions to produce estimators and confidence intervals with guaranteed validity and minimal variance. RePPI generalizes earlier prediction-powered inference (PPI) frameworks by introducing a learned recalibration component—typically, a mapping or model that projects ML predictions onto the true outcome space—thereby controlling both bias and variance, even when the surrogate predictions are systematically imperfect. This framework extends from population means to M-estimation, risk-controlled prediction sets, sub-instance evaluation metrics, and performative (feedback-loop) settings, and can be instantiated with either frequentist or fully Bayesian recalibration procedures.

1. Conceptual Foundations

RePPI is grounded in the prediction-powered inference paradigm, where predictions from a (potentially biased) automatic system ff are supplemented with a small set of labeled or gold-standard responses to debias statistical estimates and achieve greater efficiency.

  • Standard PPI: Constructs unbiased estimators and valid confidence sets by rectifying the bias induced by replacing YY with Y^=f(X)\hat Y = f(X) using the average residual computed from labeled data (Angelopoulos et al., 2023).
  • Limitation: If ff is miscalibrated or exhibits systematic bias, naive plug-in PPI may fail to reduce variance over classical estimators and may even perform worse.
  • Recalibration principle: RePPI seeks a mapping or adjustment g(Y^,X)g^*(\hat Y, X), learned from the labeled data, that minimizes the mean squared error between predictions and observed outcomes. Plugging this recalibrated surrogate into estimation guarantees both unbiasedness and minimal variance (Ji et al., 16 Jan 2025, Chen et al., 8 Jan 2026, Hofer et al., 2024).

2. Methodological Frameworks

2.1 M-Estimation and Imputed Loss

Given labeled data (Xi,Yi)(X_i, Y_i) and unlabeled data {Xj,Y^j}\{X_j, \hat Y_j\} with ML predictions, the estimation target is typically expressed as

θ=argminθE[L(Y,ψ(X;θ))]\theta^* = \arg\min_{\theta} E[L(Y, \psi(X; \theta))]

RePPI proceeds via the following steps (Ji et al., 16 Jan 2025, Song et al., 28 Jan 2026):

  1. Imputation Learning: Fit a regression on the labeled set to approximate gE[L(Y,ψ(X;θ))X,Y^]g^* \approx E[L(Y, \psi(X; \theta)) \mid X, \hat Y].
  2. Estimator Construction: Use the recalibrated imputed loss in place of the naive surrogate to define

θ^RePPI=argminθ{1ni=1nL(Yi,ψ(Xi;θ))+1Nj=n+1n+N^r(Y^j,Xj;θ)}\hat\theta_{\mathrm{RePPI}} = \arg\min_{\theta}\left\{ \frac{1}{n} \sum_{i=1}^n L(Y_i, \psi(X_i; \theta)) + \frac{1}{N} \sum_{j=n+1}^{n+N} \hat\ell_{r}(\hat Y_j, X_j; \theta) \right\}

  1. Bias Correction: Optionally, augment with bias corrections on the labeled set (see influence-function approaches and efficient augmentation (Song et al., 28 Jan 2026, Zhang et al., 3 Feb 2026)).

2.2 Bayesian Recalibration

A fully Bayesian RePPI formalism posits a latent calibration parameter θ\theta in a generative model relating ML scores sis_i and human labels yiy_i, e.g., with f(s;θ)f(s; \theta) a logistic calibration. Posterior inference yields a distribution over the proxy population mean (Hofer et al., 2024):

g(t)=1Nj=1Nf(sj;θ(t))+1ni=1n(yif(si;θ(t)))g^{(t)} = \frac{1}{N} \sum_{j=1}^N f(s_j; \theta^{(t)}) + \frac{1}{n} \sum_{i=1}^n (y_i - f(s_i; \theta^{(t)}))

Monte Carlo samples over posterior draws {θ(t)}\{\theta^{(t)}\} produce credible intervals.

2.3 Informative Labeling and Inverse Probability Weighting

RePPI admits valid inference under informative (non-MCAR) labeling by replacing the standard residual correction with a Horvitz–Thompson (HT) or Hájek adjustment using estimated propensities πi=P(Ri=1Xi)\pi_i = P(R_i=1|X_i) (Datta et al., 13 Aug 2025):

θ^RePPI,HT=1Ni=1NY^i1Ni=1NRiπ^i(Y^iYi)\hat\theta_{\mathrm{RePPI,HT}} = \frac{1}{N}\sum_{i=1}^N \hat Y_i - \frac{1}{N}\sum_{i=1}^N \frac{R_i}{\hat\pi_i}(\hat Y_i - Y_i)

Unbiasedness and n\sqrt{n}-consistency are retained under standard regularity (correct propensity model, overlap).

3. Theoretical Guarantees

4. Applications and Empirical Results

RePPI has been applied in diverse domains:

  • Biomedical and Social Science Prediction: Estimating regression coefficients with large-scale ML surrogates for costly or missing outcomes (Ji et al., 16 Jan 2025, Song et al., 28 Jan 2026).
  • LLM-as-a-Judge and Ranking Metrics: Estimation of Precision@K and other sub-instance metrics in retrieval and RAG systems with LLM-annotated relevance, incorporating isotonic regression recalibration of LLM probabilities (Divekar et al., 26 Jan 2026).
  • Risk-controlling Prediction Sets: Semi-supervised calibration of risk-controlling set size or coverage parameters, dramatically shrinking prediction sets while preserving formal error guarantees (Einbinder et al., 2024).
  • Performative Prediction: Estimation of optimal parameters in feedback-loop systems with unknown but recalibrated outcome distributions (Zhang et al., 3 Feb 2026).

Empirical benchmarks consistently show that RePPI-based estimators retain nominal coverage while significantly shrinking confidence set widths and reducing the labeling burden—for instance, 24%–36% reduction in labeled data requirement for equivalent precision in several real-world studies (Ji et al., 16 Jan 2025, Hofer et al., 2024).

5. Algorithmic Implementations

RePPI implementations commonly employ sample-splitting or cross-fitting to avoid bias from overfitting the recalibration model. Three-fold splits or K-fold cross-fitting are standard:

  1. Initial fit: Estimate θ\theta or calibration parameters on part of the data.
  2. Recalibration: Fit the imputation function gg (which could be nonparametric, e.g., random forest, splines, isotonic, or quantile mapping).
  3. Aggregation: Pool predictions and corrections across folds.
  4. Estimation and Inference: Optimize the recalibrated objective (convexity typically preserved), estimate variance or derive credible intervals.

Select practical workflows are summarized below.

Domain Recalibration Step Correction Mechanism
Regression/Mean Nonparametric gg Plug-in, bias-correct
Binary Metrics Isotonic/Platt Residual/EIF
Ranking/LLM-Judge Isotonic on LLM proba PPI++ w/calibrated LLM
Informative Labeling Propensity-weighted residual HT/Hájek estimator
Risk Control Calibrated predictive loss Finite-sample UCB
Performative Opt. Cross-fit EIF for β\beta Plug-in/IS

6. Diagnostic Tools and Assumptions

Key requirements and diagnostics:

  • Labeling Mechanism: MCAR for standard RePPI, MAR with correct IPW for informative labeling (Datta et al., 13 Aug 2025).
  • Prediction Independence: ML predictor ff must be trained on disjoint data; double-dipping causes anti-conservative inference (Song et al., 28 Jan 2026).
  • Overlap/Positivity: Propensity scores must be bounded away from zero in IPW-based RePPI (Datta et al., 13 Aug 2025).
  • Calibration Model Fit and Diagnostics: Residual and coverage diagnostics, sensitivity analyses for recalibration function misspecification.
  • Sample Size: Adequate support in the calibration set for nonparametric estimation; regularization as needed (Chen et al., 8 Jan 2026).

RePPI operates in close relation to classical surrogate outcome, double-sampling, and survey sampling strategies. It admits generalizations to:

Across settings, the recalibration-driven efficiency improvements and unbiasedness are protected under model-robust conditions and careful algorithmic design. As the paradigm evolves, open queries include optimizing calibration for multi-dimensional or instance-varying surrogates, robustness to distribution shifts, and scalable cross-fitting implementation.


References

(Angelopoulos et al., 2023, Hofer et al., 2024, Einbinder et al., 2024, Ji et al., 16 Jan 2025, Datta et al., 13 Aug 2025, Chen et al., 8 Jan 2026, Divekar et al., 26 Jan 2026, Song et al., 28 Jan 2026, Zhang et al., 3 Feb 2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recalibrated Prediction Powered Inference (RePPI).