Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation (2402.14264v2)
Abstract: Average treatment effect estimation is the most central problem in causal inference with application to numerous disciplines. While many estimation strategies have been proposed in the literature, the statistical optimality of these methods has still remained an open area of investigation, especially in regimes where these methods do not achieve parametric rates. In this paper, we adopt the recently introduced structure-agnostic framework of statistical lower bounds, which poses no structural properties on the nuisance functions other than access to black-box estimators that achieve some statistical estimation rate. This framework is particularly appealing when one is only willing to consider estimation strategies that use non-parametric regression and classification oracles as black-box sub-processes. Within this framework, we prove the statistical optimality of the celebrated and widely used doubly robust estimators for both the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT), as well as weighted variants of the former, which arise in policy evaluation.
- {barticle}[author] \bauthor\bsnmAbadie, \bfnmAlberto\binitsA. and \bauthor\bsnmImbens, \bfnmGuido W\binitsG. W. (\byear2006). \btitleLarge sample properties of matching estimators for average treatment effects. \bjournaleconometrica \bvolume74 \bpages235–267. \endbibitem
- {barticle}[author] \bauthor\bsnmArias-Castro, \bfnmEry\binitsE., \bauthor\bsnmPelletier, \bfnmBruno\binitsB. and \bauthor\bsnmSaligrama, \bfnmVenkatesh\binitsV. (\byear2018). \btitleRemember the curse of dimensionality: The case of goodness-of-fit testing in arbitrary dimension. \bjournalJournal of Nonparametric Statistics \bvolume30 \bpages448–471. \endbibitem
- {barticle}[author] \bauthor\bsnmAthey, \bfnmSusan\binitsS., \bauthor\bsnmTibshirani, \bfnmJulie\binitsJ. and \bauthor\bsnmWager, \bfnmStefan\binitsS. (\byear2019). \btitleGeneralized random forests. \bjournalThe Annals of Statistics \bvolume47 \bpages1148. \endbibitem
- {barticle}[author] \bauthor\bsnmBalakrishnan, \bfnmSivaraman\binitsS., \bauthor\bsnmKennedy, \bfnmEdward H\binitsE. H. and \bauthor\bsnmWasserman, \bfnmLarry\binitsL. (\byear2023). \btitleThe Fundamental Limits of Structure-Agnostic Functional Estimation. \bjournalarXiv preprint arXiv:2305.04116. \endbibitem
- {barticle}[author] \bauthor\bsnmBalakrishnan, \bfnmS\binitsS. and \bauthor\bsnmWasserman, \bfnmL\binitsL. (\byear2019). \btitleHypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rates. \bjournalAnnals of Statistics \bvolume47 \bpages1893–1927. \endbibitem
- {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA. and \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. (\byear2011). \btitlel1-penalized quantile regression in high-dimensional sparse models. \bjournalThe Annals of Statistics \bvolume39 \bpages82. \endbibitem
- {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA. and \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. (\byear2013). \btitleLeast squares after model selection in high-dimensional sparse models. \bjournalBernoulli \bvolume19 \bpages521–547. \endbibitem
- {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA., \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. and \bauthor\bsnmHansen, \bfnmChristian\binitsC. (\byear2014). \btitleInference on treatment effects after selection among high-dimensional controls. \bjournalReview of Economic Studies \bvolume81 \bpages608–650. \endbibitem
- {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA., \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. and \bauthor\bsnmKato, \bfnmKengo\binitsK. (\byear2015). \btitleUniform post-selection inference for least absolute deviation regression and other Z-estimation problems. \bjournalBiometrika \bvolume102 \bpages77–94. \endbibitem
- {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA., \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. and \bauthor\bsnmWang, \bfnmLie\binitsL. (\byear2014). \btitlePivotal estimation via square-root Lasso in nonparametric regression. \bjournalThe Annals of Statistics \bvolume42 \bpages757. \endbibitem
- {barticle}[author] \bauthor\bsnmBickel, \bfnmPJ\binitsP. (\byear1982). \btitleOn Adaptive Estimation. \bjournalThe Annals of Statistics \bvolume10 \bpages647. \endbibitem
- {barticle}[author] \bauthor\bsnmBickel, \bfnmPeter J\binitsP. J. and \bauthor\bsnmRitov, \bfnmYaacov\binitsY. (\byear1988). \btitleEstimating integrated squared density derivatives: sharp best order of convergence estimates. \bjournalSankhyā: The Indian Journal of Statistics, Series A \bpages381–393. \endbibitem
- {barticle}[author] \bauthor\bsnmBickel, \bfnmPeter J\binitsP. J., \bauthor\bsnmRitov, \bfnmYa’acov\binitsY. and \bauthor\bsnmTsybakov, \bfnmAlexandre B\binitsA. B. (\byear2009). \btitleSimultaneous Analysis of Lasso and Dantzig Selector. \bjournalThe Annals of Statistics \bpages1705–1732. \endbibitem
- {barticle}[author] \bauthor\bsnmBirgé, \bfnmLucien\binitsL. and \bauthor\bsnmMassart, \bfnmPascal\binitsP. (\byear1995). \btitleEstimation of integral functionals of a density. \bjournalThe Annals of Statistics \bvolume23 \bpages11–29. \endbibitem
- {barticle}[author] \bauthor\bsnmChen, \bfnmQizhao\binitsQ., \bauthor\bsnmSyrgkanis, \bfnmVasilis\binitsV. and \bauthor\bsnmAustern, \bfnmMorgane\binitsM. (\byear2022). \btitleDebiased machine learning without sample-splitting for stable estimators. \bjournalAdvances in Neural Information Processing Systems \bvolume35 \bpages3096–3109. \endbibitem
- {barticle}[author] \bauthor\bsnmChen, \bfnmXiaohong\binitsX. and \bauthor\bsnmWhite, \bfnmHalbert\binitsH. (\byear1999). \btitleImproved rates and asymptotic normality for nonparametric neural network estimators. \bjournalIEEE Transactions on Information Theory \bvolume45 \bpages682–691. \endbibitem
- {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmNewey, \bfnmWhitney K\binitsW. K. and \bauthor\bsnmSingh, \bfnmRahul\binitsR. (\byear2022). \btitleAutomatic debiased machine learning of causal and structural effects. \bjournalEconometrica \bvolume90 \bpages967–1027. \endbibitem
- {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmNewey, \bfnmWhitney K\binitsW. K. and \bauthor\bsnmSingh, \bfnmRahul\binitsR. (\byear2023). \btitleA simple and general debiased machine learning theorem with finite-sample guarantees. \bjournalBiometrika \bvolume110 \bpages257–264. \endbibitem
- {barticle}[author] \bauthor\bsnmFarrell, \bfnmMax H\binitsM. H. (\byear2015). \btitleRobust inference on average treatment effects with possibly more covariates than observations. \bjournalJournal of Econometrics \bvolume189 \bpages1–23. \endbibitem
- {barticle}[author] \bauthor\bsnmFarrell, \bfnmMax H\binitsM. H., \bauthor\bsnmLiang, \bfnmTengyuan\binitsT. and \bauthor\bsnmMisra, \bfnmSanjog\binitsS. (\byear2021). \btitleDeep neural networks for estimation and inference. \bjournalEconometrica \bvolume89 \bpages181–213. \endbibitem
- {barticle}[author] \bauthor\bsnmFoster, \bfnmDylan J\binitsD. J. and \bauthor\bsnmSyrgkanis, \bfnmVasilis\binitsV. (\byear2023). \btitleOrthogonal statistical learning. \bjournalThe Annals of Statistics \bvolume51 \bpages879–908. \endbibitem
- {barticle}[author] \bauthor\bsnmHeckman, \bfnmJames J\binitsJ. J., \bauthor\bsnmIchimura, \bfnmHidehiko\binitsH. and \bauthor\bsnmTodd, \bfnmPetra\binitsP. (\byear1998). \btitleMatching as an econometric evaluation estimator. \bjournalThe review of economic studies \bvolume65 \bpages261–294. \endbibitem
- {barticle}[author] \bauthor\bsnmHirano, \bfnmKeisuke\binitsK., \bauthor\bsnmImbens, \bfnmGuido W\binitsG. W. and \bauthor\bsnmRidder, \bfnmGeert\binitsG. (\byear2003). \btitleEfficient estimation of average treatment effects using the estimated propensity score. \bjournalEconometrica \bvolume71 \bpages1161–1189. \endbibitem
- {barticle}[author] \bauthor\bsnmImbens, \bfnmGuido W\binitsG. W. (\byear2004). \btitleNonparametric estimation of average treatment effects under exogeneity: A review. \bjournalReview of Economics and statistics \bvolume86 \bpages4–29. \endbibitem
- {barticle}[author] \bauthor\bsnmIngster, \bfnmYu I\binitsY. I. (\byear1994). \btitleMinimax detection of a signal in ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT metrics. \bjournalJournal of Mathematical Sciences \bvolume68 \bpages503–515. \endbibitem
- {barticle}[author] \bauthor\bsnmLittle, \bfnmRoderick J\binitsR. J. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear2000). \btitleCausal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. \bjournalAnnual review of public health \bvolume21 \bpages121–145. \endbibitem
- {barticle}[author] \bauthor\bsnmMayer, \bfnmAlexander K\binitsA. K. (\byear2011). \btitleDoes education increase political participation? \bjournalThe Journal of Politics \bvolume73 \bpages633–645. \endbibitem
- {barticle}[author] \bauthor\bsnmOreopoulos, \bfnmPhilip\binitsP. (\byear2006). \btitleEstimating average and local average treatment effects of education when compulsory schooling laws really matter. \bjournalAmerican Economic Review \bvolume96 \bpages152–175. \endbibitem
- {barticle}[author] \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M., \bauthor\bsnmLi, \bfnmLingling\binitsL. and \bauthor\bsnmMukherjee, \bfnmRajarshi\binitsR. (\byear2017). \btitleMinimax estimation of a functional on a structured high-dimensional model. \bjournalThe Annals of Statistics \bvolume45 \bpages1951–1987. \endbibitem
- {barticle}[author] \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M. and \bauthor\bsnmRotnitzky, \bfnmAndrea\binitsA. (\byear1995). \btitleSemiparametric efficiency in multivariate regression models with missing data. \bjournalJournal of the American Statistical Association \bvolume90 \bpages122–129. \endbibitem
- {barticle}[author] \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M., \bauthor\bsnmRotnitzky, \bfnmAndrea\binitsA. and \bauthor\bsnmZhao, \bfnmLue Ping\binitsL. P. (\byear1994). \btitleEstimation of regression coefficients when some regressors are not always observed. \bjournalJournal of the American statistical Association \bvolume89 \bpages846–866. \endbibitem
- {barticle}[author] \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M., \bauthor\bsnmRotnitzky, \bfnmAndrea\binitsA. and \bauthor\bsnmZhao, \bfnmLue Ping\binitsL. P. (\byear1995). \btitleAnalysis of semiparametric regression models for repeated outcomes in the presence of missing data. \bjournalJournal of the american statistical association \bvolume90 \bpages106–121. \endbibitem
- {barticle}[author] \bauthor\bsnmRosenbaum, \bfnmPaul R\binitsP. R. (\byear1989). \btitleOptimal matching for observational studies. \bjournalJournal of the American Statistical Association \bvolume84 \bpages1024–1032. \endbibitem
- {barticle}[author] \bauthor\bsnmRosenbaum, \bfnmPaul R\binitsP. R. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear1983). \btitleThe central role of the propensity score in observational studies for causal effects. \bjournalBiometrika \bvolume70 \bpages41–55. \endbibitem
- {barticle}[author] \bauthor\bsnmRotnitzky, \bfnmAndrea\binitsA., \bauthor\bsnmSmucler, \bfnmEzequiel\binitsE. and \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M. (\byear2021). \btitleCharacterization of parameters with a mixed bias property. \bjournalBiometrika \bvolume108 \bpages231–238. \endbibitem
- {barticle}[author] \bauthor\bsnmSchick, \bfnmAnton\binitsA. (\byear1986). \btitleOn Asymptotically Efficient Estimation in Semiparametric Models. \bjournalThe Annals of Statistics \bvolume14 \bpages1139–1151. \endbibitem
- {barticle}[author] \bauthor\bsnmSchmidt-Hieber, \bfnmAnselm Johannes\binitsA. J. (\byear2020). \btitleNonparametric regression using deep neural networks with ReLU activation function. \bjournalAnnals of statistics \bvolume48 \bpages1875–1897. \endbibitem
- {binproceedings}[author] \bauthor\bsnmSyrgkanis, \bfnmVasilis\binitsV. and \bauthor\bsnmZampetakis, \bfnmManolis\binitsM. (\byear2020). \btitleEstimation and inference with trees and forests in high dimensions. In \bbooktitleConference on learning theory \bpages3453–3454. \bpublisherPMLR. \endbibitem
- {barticle}[author] \bauthor\bsnmTao, \bfnmYebin\binitsY. and \bauthor\bsnmFu, \bfnmHaoda\binitsH. (\byear2019). \btitleDoubly robust estimation of the weighted average treatment effect for a target population. \bjournalStatistics in medicine \bvolume38 \bpages315–325. \endbibitem
- {bbook}[author] \bauthor\bsnmTsybakov, \bfnmAlexandre B\binitsA. B. (\byear2008). \btitleIntroduction to nonparametric estimation. \bpublisherSpringer Science & Business Media. \endbibitem
- {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. (\byear2014). \btitleHigher order tangent spaces and influence functions. \bjournalStatistical science \bvolume29 \bpages679–686. \endbibitem
- {barticle}[author] \bauthor\bsnmWager, \bfnmStefan\binitsS. and \bauthor\bsnmAthey, \bfnmSusan\binitsS. (\byear2018). \btitleEstimation and inference of heterogeneous treatment effects using random forests. \bjournalJournal of the American Statistical Association \bvolume113 \bpages1228–1242. \endbibitem
- {barticle}[author] \bauthor\bsnmWager, \bfnmStefan\binitsS. and \bauthor\bsnmWalther, \bfnmGuenther\binitsG. (\byear2015). \btitleAdaptive concentration of regression trees, with application to random forests. \bjournalarXiv preprint arXiv:1503.06388. \endbibitem
- {barticle}[author] \bauthor\bsnmZou, \bfnmHui\binitsH. and \bauthor\bsnmHastie, \bfnmTrevor\binitsT. (\byear2005). \btitleRegularization and variable selection via the elastic net. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume67 \bpages301–320. \endbibitem