Automated, efficient and model-free inference for randomized clinical trials via data-driven covariate adjustment (2404.11150v1)
Abstract: In May 2023, the U.S. Food and Drug Administration (FDA) released guidance for industry on "Adjustment for Covariates in Randomized Clinical Trials for Drugs and Biological Products". Covariate adjustment is a statistical analysis method for improving precision and power in clinical trials by adjusting for pre-specified, prognostic baseline variables. Though recommended by the FDA and the European Medicines Agency (EMA), many trials do not exploit the available information in baseline variables or make use only of the baseline measurement of the outcome. This is likely (partly) due to the regulatory mandate to pre-specify baseline covariates for adjustment, leading to challenges in determining appropriate covariates and their functional forms. We will explore the potential of automated data-adaptive methods, such as machine learning algorithms, for covariate adjustment, addressing the challenge of pre-specification. Specifically, our approach allows the use of complex models or machine learning algorithms without compromising the interpretation or validity of the treatment effect estimate and its corresponding standard error, even in the presence of misspecified outcome working models. This contrasts the majority of competing works which assume correct model specification for the validity of standard errors. Our proposed estimators either necessitate ultra-sparsity in the outcome model (which can be relaxed by limiting the number of predictors in the model) or necessitate integration with sample splitting to enhance their performance. As such, we will arrive at simple estimators and standard errors for the marginal treatment effect in randomized clinical trials, which exploit data-adaptive outcome predictions based on prognostic baseline covariates, and have low (or no) bias in finite samples even when those predictions are themselves biased.
- Some surprising results about covariate adjustment in logistic regression models. International Statistical Review/Revue Internationale de Statistique, pages 227–240, 1991.
- Making apples from oranges: Comparing noncollapsible effect estimators and their standard errors after adjustment for different covariate sets. Biometrical Journal, 63(3):528–557, 2021.
- Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach. Statistics in medicine, 27(23):4658–4677, 2008.
- FDA and EMA. E9 statistical principles for clinical trials. U.S. Food and Drug Administration: CDER/CBER. European Medicines Agency: CPMP/ICH/363/96. https://www.ema.europa.eu/en/documents/scientific-guideline/ich-e-9-statistical-principles-clinical-trials-step-5__\__en.pdf, 1998. Last checked: 2021-02-03.
- EMA. Guideline on adjustment for baseline covariates in clinical trials., 2015. URL www.ema.europa.eu. Last checked: 2022-05-30.
- Targeted learning: causal inference for observational and experimental data, volume 10. Springer, 2011.
- High-dimensional regression adjustments in randomized experiments. Proceedings of the National Academy of Sciences, 113(45):12673–12678, 2016.
- The loop estimator: Adjusting for covariates in randomized experiments. Evaluation review, 42(4):458–488, 2018.
- Stijn Vansteelandt. Statistical modelling in the age of data science. Observational Studies, 7(1):217–228, 2021.
- Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 01 2018. ISSN 1368-4221. doi: 10.1111/ectj.12097.
- Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory, 24(2):338–376, 2008.
- Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies, 81(2):608–650, 2014a.
- Program evaluation and causal inference with high-dimensional data. Econometrica, 85(1):233–298, 2017.
- Kelly L Moore and Mark J van der Laan. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Statistics in medicine, 28(1):39–64, 2009.
- Bias-reduced doubly robust estimation. Journal of the American Statistical Association, 110(511):1024–1036, 2015.
- High-dimensional inference for the average treatment effect under model misspecification using penalized bias-reduced double-robust estimation. Biostatistics & Epidemiology, 6(2):221–238, 2022.
- Doubly robust nonparametric inference on the average treatment effect. Biometrika, 104(4):863–880, 2017.
- On doubly robust inference for double machine learning. arXiv preprint arXiv:2107.06124, 2021.
- Optimizing precision and power by machine learning in randomized trials, with an application to covid-19, 2021. URL https://arxiv.org/abs/2109.04294.
- A general form of covariate adjustment in randomized clinical trials. arXiv preprint arXiv:2306.10213, 2023.
- FDA. Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products. Guidance for Industry. https://www.fda.gov/media/148910/download, 2021. Last checked: 2021-10-20.
- Improving precision and power in randomized trials for COVID-19 treatments using covariate adjustment, for binary, ordinal, and time-to-event outcomes. Biometrics, 77:1467–1481, 2020.
- Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866, 1994.
- Li Yang and Anastasios A Tsiatis. Efficiency study of estimators for a treatment effect in a pretest–posttest trial. The American Statistician, 55(4):314–321, 2001.
- Anastasios A Tsiatis. Semiparametric theory and missing data. Springer, 2006.
- Daniel B Rubin and Mark J van der Laan. Empirical efficiency maximization: Improved locally efficient covariate adjustment in randomized experiments and survival analysis. The International Journal of Biostatistics, 4(1), 2008.
- Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika, 96(3):723–734, 2009.
- Susan Gruber and Mark J van der Laan. Targeted minimum loss based estimator that outperforms a given estimator. The International Journal of Biostatistics, 8(1), 2012.
- Michael Rosenblum and Mark J Van Der Laan. Simple, efficient estimators of treatment effects in randomized trials using generalized linear models to leverage baseline variables. The international journal of biostatistics, 6(1), 2010.
- Wenjing Zheng and Mark J van der Laan. Cross-validated targeted minimum-loss-based estimation. In Targeted Learning, pages 459–474. Springer, 2011.
- Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values. arXiv preprint arXiv:2301.02739, 2023.
- Ensuring valid inference for hazard ratios after variable selection, 2021a. URL https://arxiv.org/abs/2112.00172.
- Principled selection of baseline covariates to account for censoring in randomized trials with a survival endpoint. Statistics in Medicine, 40(18):4108–4121, 2021b.
- Estimating the efficiency gain of covariate-adjusted analyses in future clinical trials using external data. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2):356–377, 2023.
- To adjust or not to adjust? estimating the average treatment effect in randomized experiments with missing covariates. Journal of the American Statistical Association, 119(545):450–460, 2024.
- Max H Farrell. Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics, 189(1):1–23, 2015.
- High-dimensional methods and inference on structural and treatment effects. Journal of Economic Perspectives, 28(2):29–50, 2014b.
- Sparsity double robust inference of average treatment effects. arXiv preprint arXiv:1905.00744, 2019.
- Least squares after model selection in high-dimensional sparse models. Bernoulli, 19(2):521–547, 05 2013. doi: 10.3150/11-BEJ410.
- Patrick Billingsley. Convergence of probability measures. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, second edition, 1999. ISBN 0-471-19745-9. A Wiley-Interscience Publication.
- Self-normalized processes. Limit theory and statistical applications. Probability and Its Applications. Springer Berlin Heidelberg, 2009. ISBN 9783540856368. doi: 10.1007/978-3-540-85636-8.
- Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica, 80(6):2369–2429, 2012.
- A primer of real analytic functions. Springer Science & Business Media, 2002.
- Large sample estimation and hypothesis testing. Handbook of econometrics, 4:2111–2245, 1994.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.