Statistical Inference for Heterogeneous Treatment Effects Discovered by Generic Machine Learning in Randomized Experiments (2203.14511v3)
Abstract: Researchers are increasingly turning to ML algorithms to investigate causal heterogeneity in randomized experiments. Despite their promise, ML algorithms may fail to accurately ascertain heterogeneous treatment effects under practical settings with many covariates and small sample size. In addition, the quantification of estimation uncertainty remains a challenge. We develop a general approach to statistical inference for heterogeneous treatment effects discovered by a generic ML algorithm. We apply the Neyman's repeated sampling framework to a common setting, in which researchers use an ML algorithm to estimate the conditional average treatment effect and then divide the sample into several groups based on the magnitude of the estimated effects. We show how to estimate the average treatment effect within each of these groups, and construct a valid confidence interval. In addition, we develop nonparametric tests of treatment effect homogeneity across groups, and rank-consistency of within-group average treatment effects. The validity of our methodology does not rely on the properties of ML algorithms because it is solely based on the randomization of treatment assignment and random sampling of units. Finally, we generalize our methodology to the cross-fitting procedure by accounting for the additional uncertainty induced by the random splitting of data.
- Validity of subsampling and “plug-in asymptotic” inference for parameters defined by moment inequalities. Econometric Theory 25, 3, 669–709.
- Inference for parameters defined by moment inequalities using generalized moment selection. Econometrica 78, 1, 119–157.
- Asymptotics of cross-validation. arXiv preprint arXiv:2001.11111 .
- Bhattacharya, P. K. (1974). Convergence of sample paths of normalized sums of induced order statistics. The Annals of Statistics 1034–1039.
- A user’s guide to inference in models defined by moment inequalities. Tech. rep., National Bureau of Economic Research.
- Canay, I. A. (2010). El inference for partially identified models: Large deviations optimality and bootstrap validity. Journal of Econometrics 156, 2, 408–425.
- Inference on causal and structural parameters using many moment inequalities. The Review of Economic Studies 86, 5, 1867–1900.
- Fisher-schultz lecture: Generic machine learning inference on heterogeneous treatment effects in randomized experiments. Tech. rep., arXiv:1712.04802.
- Bart: Bayesian additive regression trees. The Annals of Applied Statistics 4, 1, 266–298.
- Some theorems on distribution functions. Journal of the London Mathematical Society 1, 4, 290–294.
- Nonparametric tests for treatment effect heterogeneity. The Review of Economics and Statistics 90, 3, 389–405.
- Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American statistical Association 94, 448, 1053–1062.
- Randomization inference for treatment effect variation. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 78, 3, 655–671.
- Decomposing treatment effect variation. Journal of the American Statistical Association 114, 525, 304–317.
- Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition. Statistical Science 34, 1, 43–68.
- Stable discovery of interpretable subgroups via calibration in causal studies. International Statistical Review 88, S135–S178.
- Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Bayesian Analysis 15, 3, 965–1056.
- Higham, N. J. (2002). Computing the nearest correlation matrix—a problem from finance. IMA journal of Numerical Analysis 22, 3, 329–343.
- Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20, 1, 217–240.
- Holland, P. W. (1986). Statistics and causal inference (with discussion). Journal of the American Statistical Association 81, 945–960.
- Comment on “generic machine learning inference on heterogeneous treatment effects in randomized experiments.”. Tech. rep., Working Paper.
- Experimental evaluation of individualized treatment rules. Journal of the American Statistical Association 118, 541, 242–256.
- Statistical performance guarantee for selecting those predicted to benefit most from treatment. Tech. rep., arXiv preprint 2310.07973.
- Meta-learners for estimating heterogeneous treatment effects using machine learning. Tech. rep., arXiv:1706.03461.
- LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. The American economic review 604–620.
- Inference for the generalization error. In Advances in neural information processing systems, 307–313.
- Neyman, J. (1923). On the application of probability theory to agricultural experiments: Essay on principles, section 9. (translated in 1990). Statistical Science 5, 465–480.
- Rubin, D. B. (1990). Comments on “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” by J. Splawa-Neyman translated from the Polish and edited by D. M. Dabrowska and T. P. Speed. Statistical Science 5, 472–480.
- Shapiro, A. (1988). Towards a unified theory of inequality constrained testing in multivariate analysis. International Statistical Review/Revue Internationale de Statistique 56, 1, 49–62.
- Shorack, G. R. (1972). Functions of order statistics. The Annals of Mathematical Statistics 43, 2, 412–427.
- Tibshirani, R. (1996). Regression shrinkage and selection via LASSO. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 58, 1, 267–288.
- Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association 113, 523, 1228–1242.
- Wellner, J. A. (1977). A glivenko-cantelli theorem and strong laws of large numbers for functions of order statistics. The Annals of Statistics 5, 3, 473–480.
- Evaluating treatment prioritization rules via rank-weighted average treatment effects. arXiv preprint arXiv:2111.07966 .