Statistical Optimality of Prediction-Powered Inference

Published 7 Jun 2026 in math.ST | (2606.08730v1)

Abstract: The prediction-powered inference (PPI) proposed by Angelopoulos et al. (2023) is a popular method that leverages a small number of labeled samples and machine learning predictions for semi-supervised inference. While several variants of PPI have appeared in the literature, its rigorous statistical theory has not been fully developed. In this paper, we study the statistical optimality of PPI. Our contributions span both foundational theory and new methodology. First, we frame PPI as an M-estimation problem, revealing a link between the bias-corrected PPI estimating equation and the ideal full-data estimating equation. This connection leads to the consistency and asymptotic normality of the PPI estimator under simple random sampling without replacement. Next, we identify the efficient influence function and prove that PPI can attain the semiparametric efficiency lower bound when the predictor is score-calibrated, that is, when the predictor's output aligns with the true conditional expectation of the estimating function. Finally, for learned prediction rules, we develop asymptotic theory for cross-fitting and for a single-fit variant with variance correction in the special case of semiparametric mean estimation. Simulation experiments and a real-data application support these findings.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper establishes that Prediction-Powered Inference attains semiparametric efficiency when the predictive model is score-calibrated.
It develops cross-fitting and variance correction methods to mitigate bias from overfitting in learned prediction rules.
Extensive simulations confirm that the debiased PPI methods improve accuracy and maintain proper confidence interval coverage over labeled-only estimators.

Statistical Optimality and Semiparametric Efficiency in Prediction-Powered Inference

Introduction

Prediction-Powered Inference (PPI) has become a foundational paradigm for semi-supervised inference, exploiting the joint structure of unlabeled covariates and a limited set of labeled responses. This methodology systematically augments classical M-estimation by leveraging machine learning-based predictions to impute missing outcomes, with a bias correction (the "rectifier") estimated on the labeled sample. Despite the proliferation of PPI variants and empirical evidence for their efficacy, there have been lacunae regarding the formal statistical properties of PPI, in particular its optimality and conditions for semiparametric efficiency. This paper addresses these deficiencies, providing a comprehensive asymptotic theory for PPI, establishing necessary and sufficient conditions for statistical optimality, and introducing principled methodological corrections for learned prediction rules.

Moment Equation Formalism and M-Estimation Perspective

The authors establish a unifying theoretical framework by casting PPI as an M-estimation problem under a general moment equation constraint. The inferential target is given by a population moment condition $E[U(\theta; X, Y)] = 0$ , while the infeasible oracle solution based on full data is replaced with a computable bias-corrected score that incorporates model-based imputations and a residual-based correction. Under random sampling without replacement (SRSWOR), the PPI score is shown to be design-unbiased for the finite-population moment equation regardless of the choice of predictor, with only the asymptotic variance reflecting possible prediction error. Uniform consistency and asymptotic normality follow under standard regularity conditions, and the point estimator admits a linear expansion whose influence function has a natural decomposition into oracle variance and a prediction-inefficiency term.

Semiparametric Efficiency Theory

The paper provides a rigorous semiparametric analysis for PPI under missing response data, fully characterizing the efficient influence function for regular estimators under the superpopulation framework. The key result is that the PPI estimator achieves the semiparametric efficiency bound if and only if the predictive model is "score-calibrated"—that is, it satisfies $U(\theta_0; X, m(X)) = E[U(\theta_0; X, Y) | X]$ a.s. at the true parameter. This is a strictly weaker condition than perfect prediction of $Y$ , aligning instead with unbiased estimation of the estimating function's mean. In mean estimation, this reduces to requiring that $m(X)$ is consistent for $E[Y|X]$ . Under these conditions, the variance of the PPI estimator matches the semiparametric lower bound, and the method remains robust to model misspecification in terms of bias.

PPI with Learned Prediction Rules: Cross-Fitting and Single-Fit Variance Correction

The practical application of PPI requires learning the prediction rule $m(\cdot)$ from the labeled sample, introducing the possibility of overfitting and dependence-induced bias when the same observations are used for both training and evaluation in the rectifier. The authors delinate the limitations of "vanilla" PPI in this learned setting, demonstrating—both theoretically and empirically—that naive residual reuse yields sub-nominal confidence interval coverage, especially in low-label regimes.

To address this, two solutions are developed and rigorously analyzed:

Cross-Fitted PPI (CF-PPI): The labeled sample is partitioned into $K$ folds; for each fold, the prediction rule is trained on all but that fold, and out-of-fold residuals are used for bias correction. This sample-splitting procedure restores conditional independence and ensures the op $(N^{-1/2})$ rate in asymptotic linearity. The theoretical analysis dispenses with nonstandard complexity restrictions: L2-consistency of the cross-fitted predictor suffices for asymptotic normality and valid Wald-type inference.
Single-Fit PPI with Variance Correction (SF-PPI-VC): For certain regressors (notably kernel ridge regression with mass preservation), the paper derives an explicit form of degrees-of-freedom correction that accounts for in-sample noise leakage in the estimation of uncertainty. With this variance correction, a single-fit, bias-corrected PPI estimator is provably consistent and asymptotically normal, with variance estimators that are first-order equivalent to the semiparametric bound under regularity.

Simulation and Empirical Results

Extensive simulation studies are conducted under diverse response surface scenarios and labeling fractions, systematically comparing classical labeled-only estimators, oracle and misspecified PPI with fixed predictors, and the full suite of learned-PPI procedures. Several findings are substantiated:

Statistical Efficiency: When the predictor is oracle or highly accurate, both CF-PPI and SF-PPI-VC dramatically reduce MAE and RMSE relative to labeled-only approaches, while maintaining coverage at or near the nominal 95% level. Vanilla PPI, without correction, yields overly narrow intervals and undercoverage, especially for $f\ll1$ .
Misspecification Robustness: PPI is unbiased under arbitrary (fixed) predictors, but the variance component penalizes inefficiency when the model is poor or deliberately misspecified. This is observed empirically.
Empirical Calibration: CF-PPI rescues nominal coverage as $f$ increases, with negligible computational overhead compared to the naive approach. SF-PPI-VC's explicit variance adjustment similarly stabilizes inference without repeated refitting.

The application to the UCI Energy Efficiency dataset further corroborates the superiority of debiased PPI estimators over classical mean estimation in terms of point accuracy and interval width, with the confidence intervals of PPI procedures consistently tight and properly centered.

Theoretical Implications and Future Directions

This work establishes that PPI admits a rigorous statistical foundation, with clear efficiency-theoretic characterizations and remedy prescriptions for practical implementation. The analysis extends the survey sampling literature (e.g., difference estimators and Horvitz-Thompson correction) by incorporating modern ML predictors, and it connects to the double/debiased machine learning and targeted learning frameworks. Methodologically, the approach directly informs the design of confidence intervals, variance estimation, and guidance on when sample-splitting or variance correction is warranted.

An important implication is that, in high-dimensional or modern ML contexts, variance correction (either via cross-fitting or degrees-of-freedom adjustment) should be regarded as essential for valid inference, especially as labeled data becomes increasingly scarce. The explicit recognition and adjustment for overfitting-induced bias provides a blueprint for generalizing these principles to other complex moment restrictions, structured prediction tasks, and federated or privacy-preserving data settings.

Conclusion

This paper closes a significant gap in the statistical theory of semi-supervised inference, analytically resolving when and how PPI achieves optimal (efficient) inference and providing robust methodologies for uncertainty quantification under learned prediction rules. The theoretical guarantees lead directly to practical algorithms for constructing valid confidence intervals and yield empirically verified improvements in both accuracy and reliability. These results position PPI as not only a computationally effective but also an inferentially rigorous approach to leveraging large-scale unlabeled data for statistical estimation.

Markdown Report Issue