Generalizable estimation of conditional average treatment effects using Causal Forest in randomized (2506.12296v1)
Abstract: Generalizing conditional average treatment effects (CATE) estimates in a randomized controlled trial (RCT) to a broader source population can be challenging because of selection bias and high-dimensional covariates. We aim to evaluate CATE estimation approaches using Causal Forest that address selection bias due to trial participation. We propose and compare four CATE estimation approaches using Causal Forest: (1) ignoring selection variables, (2) including selection variables, (3) using inverse probability weighting (IPW) either with (1) or (2). Identifiable condition suggests that including covariates that determine trial selection in CATE-estimating models can yield an unbiased CATE estimate in the source population. However, simulations showed that, in realistic sample sizes in a medical trial, this approach substantially increased variance compared with little gain in bias reduction. IPW-based approaches showed a better performance in most settings by addressing selection bias. Increasing covariates that determine trial participation in Causal Forest estimation can substantially inflate the variance, diminishing benefits of bias reduction. IPW offers a more robust method to adjust for selection bias due to trial participation.