Externally Valid Policy Evaluation Combining Trial and Observational Data (2310.14763v3)
Abstract: Randomized trials are widely considered as the gold standard for evaluating the effects of decision policies. Trial data is, however, drawn from a population which may differ from the intended target population and this raises a problem of external validity (aka. generalizability). In this paper we seek to use trial data to draw valid inferences about the outcome of a policy on the target population. Additional covariate data from the target population is used to model the sampling of individuals in the trial study. We develop a method that yields certifiably valid trial-based policy evaluations under any specified range of model miscalibrations. The method is nonparametric and the validity is assured even with finite samples. The certified policy evaluations are illustrated using both simulated and real data.
- Generalizing evidence from randomized trials using inverse probability of sampling weights. Journal of the Royal Statistical Society Series A: Statistics in Society, 181(4):1193–1209, 2018.
- Experimental and Quasi-experimental Designs for Research. R. McNally College Publishing Company, 1963.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, 2016.
- Robust covariate shift regression. In Artificial Intelligence and Statistics, pp. 1270–1279. PMLR, 2016.
- Generalizing evidence from randomized clinical trials to target populations: the actg 320 trial. American journal of epidemiology, 172(1):107–115, 2010.
- Causal inference methods for combining randomized trials and observational studies: a review. arXiv preprint arXiv:2011.08047, 2020.
- A review of generalizability and transportability. Annual Review of Statistics and Its Application, 10:501–524, 2023.
- Off-policy evaluation with out-of-sample guarantees. Transactions on Machine Learning Research, 2023.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
- Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2):337–407, 2000.
- Off-policy risk assessment in contextual bandits. Advances in Neural Information Processing Systems, 34:23714–23726, 2021.
- Sensitivity analysis of individual treatment effects: A robust conformal inference approach. Proceedings of the National Academy of Sciences, 120(6), 2023.
- Assessing methods for generalizing experimental impact estimates to target populations. Journal of research on educational effectiveness, 9(1):103–127, 2016.
- Precision medicine. Annual review of statistics and its application, 6:263–286, 2019.
- Improving propensity score weighting using machine learning. Statistics in medicine, 29(3):337–346, 2010.
- Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111, 2018.
- Policy learning under biased sample selection. arXiv preprint arXiv:2304.11735, 2023.
- The effect of antiretroviral therapy on all-cause mortality, generalized to persons diagnosed with hiv in the usa, 2009–11. International journal of epidemiology, 45(1):140–150, 2016.
- Generalizing study results: a potential outcomes perspective. Epidemiology (Cambridge, Mass.), 28(4):553, 2017.
- Generalizing trial evidence to target populations in non-nested designs: Applications to aids clinical trials. Journal of the Royal Statistical Society Series C: Applied Statistics, 71(3):669–697, 2022.
- Manski, C. F. Identification problems in the social sciences and everyday life. Southern Economic Journal, 70(1):11–21, 2003.
- Manski, C. F. Identification for prediction and decision. Harvard University Press, 2007.
- Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological methods, 9(4):403, 2004.
- Reliability of subjective probability forecasts of precipitation and temperature. Journal of the Royal Statistical Society Series C: Applied Statistics, 26(1):41–47, 1977.
- Obtaining well calibrated probabilities using bayesian binning. Proceedings of the AAAI conference on artificial intelligence, 29(1), 2015.
- External Validity: From Do-Calculus to Transportability Across Populations. Statistical Science, 29(4):579 – 595, 2014. doi: 10.1214/14-STS486.
- Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
- Performance guarantees for individualized treatment rules. Annals of statistics, 39(2):1180, 2011.
- Dataset shift in machine learning. Mit Press, 2008.
- Doubly robust covariate shift correction. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1), 2015.
- The right tool for the job: Choosing between covariate balancing and generalized boosted model propensity scores. Epidemiology (Cambridge, Mass.), 28(6):802, 2017.
- A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
- Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227–244, 2000.
- Smith, C. M. Origin and uses of primum non nocere—above all, do no harm! The Journal of Clinical Pharmacology, 45(4):371–377, 2005.
- The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society Series A: Statistics in Society, 174(2):369–386, 2011.
- Direct importance estimation with model selection and its application to covariate shift adaptation. Advances in neural information processing systems, 20, 2007.
- Tan, Z. A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101(476):1619–1637, 2006.
- Conformal prediction under covariate shift. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Tu, C. Comparison of various machine learning algorithms for estimating generalized propensity score. Journal of Statistical Computation and Simulation, 89(4):708–719, 2019.
- Algorithmic learning in a random world. Springer Science & Business Media, 2005.
- Quantile-optimal treatment regimes. Journal of the American Statistical Association, 113(523):1243–1254, 2018.
- Westreich, D. Epidemiology by Design: A Causal Approach to the Health Sciences. Oxford University Press, Incorporated, 2019. ISBN 9780190665760.
- Transportability of trial results using inverse odds of sampling weights. American journal of epidemiology, 186(8):1010–1014, 2017.
- Calibration tests in multi-class classification: A unifying framework. Advances in neural information processing systems, 32, 2019.
- Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(4):735–761, 2019.
- Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499):1106–1118, 2012.