Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation (2402.14264v2)

Published 22 Feb 2024 in stat.ML, cs.LG, econ.EM, math.ST, stat.ME, and stat.TH

Abstract: Average treatment effect estimation is the most central problem in causal inference with application to numerous disciplines. While many estimation strategies have been proposed in the literature, the statistical optimality of these methods has still remained an open area of investigation, especially in regimes where these methods do not achieve parametric rates. In this paper, we adopt the recently introduced structure-agnostic framework of statistical lower bounds, which poses no structural properties on the nuisance functions other than access to black-box estimators that achieve some statistical estimation rate. This framework is particularly appealing when one is only willing to consider estimation strategies that use non-parametric regression and classification oracles as black-box sub-processes. Within this framework, we prove the statistical optimality of the celebrated and widely used doubly robust estimators for both the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATT), as well as weighted variants of the former, which arise in policy evaluation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. {barticle}[author] \bauthor\bsnmAbadie, \bfnmAlberto\binitsA. and \bauthor\bsnmImbens, \bfnmGuido W\binitsG. W. (\byear2006). \btitleLarge sample properties of matching estimators for average treatment effects. \bjournaleconometrica \bvolume74 \bpages235–267. \endbibitem
  2. {barticle}[author] \bauthor\bsnmArias-Castro, \bfnmEry\binitsE., \bauthor\bsnmPelletier, \bfnmBruno\binitsB. and \bauthor\bsnmSaligrama, \bfnmVenkatesh\binitsV. (\byear2018). \btitleRemember the curse of dimensionality: The case of goodness-of-fit testing in arbitrary dimension. \bjournalJournal of Nonparametric Statistics \bvolume30 \bpages448–471. \endbibitem
  3. {barticle}[author] \bauthor\bsnmAthey, \bfnmSusan\binitsS., \bauthor\bsnmTibshirani, \bfnmJulie\binitsJ. and \bauthor\bsnmWager, \bfnmStefan\binitsS. (\byear2019). \btitleGeneralized random forests. \bjournalThe Annals of Statistics \bvolume47 \bpages1148. \endbibitem
  4. {barticle}[author] \bauthor\bsnmBalakrishnan, \bfnmSivaraman\binitsS., \bauthor\bsnmKennedy, \bfnmEdward H\binitsE. H. and \bauthor\bsnmWasserman, \bfnmLarry\binitsL. (\byear2023). \btitleThe Fundamental Limits of Structure-Agnostic Functional Estimation. \bjournalarXiv preprint arXiv:2305.04116. \endbibitem
  5. {barticle}[author] \bauthor\bsnmBalakrishnan, \bfnmS\binitsS. and \bauthor\bsnmWasserman, \bfnmL\binitsL. (\byear2019). \btitleHypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rates. \bjournalAnnals of Statistics \bvolume47 \bpages1893–1927. \endbibitem
  6. {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA. and \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. (\byear2011). \btitlel1-penalized quantile regression in high-dimensional sparse models. \bjournalThe Annals of Statistics \bvolume39 \bpages82. \endbibitem
  7. {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA. and \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. (\byear2013). \btitleLeast squares after model selection in high-dimensional sparse models. \bjournalBernoulli \bvolume19 \bpages521–547. \endbibitem
  8. {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA., \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. and \bauthor\bsnmHansen, \bfnmChristian\binitsC. (\byear2014). \btitleInference on treatment effects after selection among high-dimensional controls. \bjournalReview of Economic Studies \bvolume81 \bpages608–650. \endbibitem
  9. {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA., \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. and \bauthor\bsnmKato, \bfnmKengo\binitsK. (\byear2015). \btitleUniform post-selection inference for least absolute deviation regression and other Z-estimation problems. \bjournalBiometrika \bvolume102 \bpages77–94. \endbibitem
  10. {barticle}[author] \bauthor\bsnmBelloni, \bfnmAlexandre\binitsA., \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV. and \bauthor\bsnmWang, \bfnmLie\binitsL. (\byear2014). \btitlePivotal estimation via square-root Lasso in nonparametric regression. \bjournalThe Annals of Statistics \bvolume42 \bpages757. \endbibitem
  11. {barticle}[author] \bauthor\bsnmBickel, \bfnmPJ\binitsP. (\byear1982). \btitleOn Adaptive Estimation. \bjournalThe Annals of Statistics \bvolume10 \bpages647. \endbibitem
  12. {barticle}[author] \bauthor\bsnmBickel, \bfnmPeter J\binitsP. J. and \bauthor\bsnmRitov, \bfnmYaacov\binitsY. (\byear1988). \btitleEstimating integrated squared density derivatives: sharp best order of convergence estimates. \bjournalSankhyā: The Indian Journal of Statistics, Series A \bpages381–393. \endbibitem
  13. {barticle}[author] \bauthor\bsnmBickel, \bfnmPeter J\binitsP. J., \bauthor\bsnmRitov, \bfnmYa’acov\binitsY. and \bauthor\bsnmTsybakov, \bfnmAlexandre B\binitsA. B. (\byear2009). \btitleSimultaneous Analysis of Lasso and Dantzig Selector. \bjournalThe Annals of Statistics \bpages1705–1732. \endbibitem
  14. {barticle}[author] \bauthor\bsnmBirgé, \bfnmLucien\binitsL. and \bauthor\bsnmMassart, \bfnmPascal\binitsP. (\byear1995). \btitleEstimation of integral functionals of a density. \bjournalThe Annals of Statistics \bvolume23 \bpages11–29. \endbibitem
  15. {barticle}[author] \bauthor\bsnmChen, \bfnmQizhao\binitsQ., \bauthor\bsnmSyrgkanis, \bfnmVasilis\binitsV. and \bauthor\bsnmAustern, \bfnmMorgane\binitsM. (\byear2022). \btitleDebiased machine learning without sample-splitting for stable estimators. \bjournalAdvances in Neural Information Processing Systems \bvolume35 \bpages3096–3109. \endbibitem
  16. {barticle}[author] \bauthor\bsnmChen, \bfnmXiaohong\binitsX. and \bauthor\bsnmWhite, \bfnmHalbert\binitsH. (\byear1999). \btitleImproved rates and asymptotic normality for nonparametric neural network estimators. \bjournalIEEE Transactions on Information Theory \bvolume45 \bpages682–691. \endbibitem
  17. {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmNewey, \bfnmWhitney K\binitsW. K. and \bauthor\bsnmSingh, \bfnmRahul\binitsR. (\byear2022). \btitleAutomatic debiased machine learning of causal and structural effects. \bjournalEconometrica \bvolume90 \bpages967–1027. \endbibitem
  18. {barticle}[author] \bauthor\bsnmChernozhukov, \bfnmVictor\binitsV., \bauthor\bsnmNewey, \bfnmWhitney K\binitsW. K. and \bauthor\bsnmSingh, \bfnmRahul\binitsR. (\byear2023). \btitleA simple and general debiased machine learning theorem with finite-sample guarantees. \bjournalBiometrika \bvolume110 \bpages257–264. \endbibitem
  19. {barticle}[author] \bauthor\bsnmFarrell, \bfnmMax H\binitsM. H. (\byear2015). \btitleRobust inference on average treatment effects with possibly more covariates than observations. \bjournalJournal of Econometrics \bvolume189 \bpages1–23. \endbibitem
  20. {barticle}[author] \bauthor\bsnmFarrell, \bfnmMax H\binitsM. H., \bauthor\bsnmLiang, \bfnmTengyuan\binitsT. and \bauthor\bsnmMisra, \bfnmSanjog\binitsS. (\byear2021). \btitleDeep neural networks for estimation and inference. \bjournalEconometrica \bvolume89 \bpages181–213. \endbibitem
  21. {barticle}[author] \bauthor\bsnmFoster, \bfnmDylan J\binitsD. J. and \bauthor\bsnmSyrgkanis, \bfnmVasilis\binitsV. (\byear2023). \btitleOrthogonal statistical learning. \bjournalThe Annals of Statistics \bvolume51 \bpages879–908. \endbibitem
  22. {barticle}[author] \bauthor\bsnmHeckman, \bfnmJames J\binitsJ. J., \bauthor\bsnmIchimura, \bfnmHidehiko\binitsH. and \bauthor\bsnmTodd, \bfnmPetra\binitsP. (\byear1998). \btitleMatching as an econometric evaluation estimator. \bjournalThe review of economic studies \bvolume65 \bpages261–294. \endbibitem
  23. {barticle}[author] \bauthor\bsnmHirano, \bfnmKeisuke\binitsK., \bauthor\bsnmImbens, \bfnmGuido W\binitsG. W. and \bauthor\bsnmRidder, \bfnmGeert\binitsG. (\byear2003). \btitleEfficient estimation of average treatment effects using the estimated propensity score. \bjournalEconometrica \bvolume71 \bpages1161–1189. \endbibitem
  24. {barticle}[author] \bauthor\bsnmImbens, \bfnmGuido W\binitsG. W. (\byear2004). \btitleNonparametric estimation of average treatment effects under exogeneity: A review. \bjournalReview of Economics and statistics \bvolume86 \bpages4–29. \endbibitem
  25. {barticle}[author] \bauthor\bsnmIngster, \bfnmYu I\binitsY. I. (\byear1994). \btitleMinimax detection of a signal in ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT metrics. \bjournalJournal of Mathematical Sciences \bvolume68 \bpages503–515. \endbibitem
  26. {barticle}[author] \bauthor\bsnmLittle, \bfnmRoderick J\binitsR. J. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear2000). \btitleCausal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. \bjournalAnnual review of public health \bvolume21 \bpages121–145. \endbibitem
  27. {barticle}[author] \bauthor\bsnmMayer, \bfnmAlexander K\binitsA. K. (\byear2011). \btitleDoes education increase political participation? \bjournalThe Journal of Politics \bvolume73 \bpages633–645. \endbibitem
  28. {barticle}[author] \bauthor\bsnmOreopoulos, \bfnmPhilip\binitsP. (\byear2006). \btitleEstimating average and local average treatment effects of education when compulsory schooling laws really matter. \bjournalAmerican Economic Review \bvolume96 \bpages152–175. \endbibitem
  29. {barticle}[author] \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M., \bauthor\bsnmLi, \bfnmLingling\binitsL. and \bauthor\bsnmMukherjee, \bfnmRajarshi\binitsR. (\byear2017). \btitleMinimax estimation of a functional on a structured high-dimensional model. \bjournalThe Annals of Statistics \bvolume45 \bpages1951–1987. \endbibitem
  30. {barticle}[author] \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M. and \bauthor\bsnmRotnitzky, \bfnmAndrea\binitsA. (\byear1995). \btitleSemiparametric efficiency in multivariate regression models with missing data. \bjournalJournal of the American Statistical Association \bvolume90 \bpages122–129. \endbibitem
  31. {barticle}[author] \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M., \bauthor\bsnmRotnitzky, \bfnmAndrea\binitsA. and \bauthor\bsnmZhao, \bfnmLue Ping\binitsL. P. (\byear1994). \btitleEstimation of regression coefficients when some regressors are not always observed. \bjournalJournal of the American statistical Association \bvolume89 \bpages846–866. \endbibitem
  32. {barticle}[author] \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M., \bauthor\bsnmRotnitzky, \bfnmAndrea\binitsA. and \bauthor\bsnmZhao, \bfnmLue Ping\binitsL. P. (\byear1995). \btitleAnalysis of semiparametric regression models for repeated outcomes in the presence of missing data. \bjournalJournal of the american statistical association \bvolume90 \bpages106–121. \endbibitem
  33. {barticle}[author] \bauthor\bsnmRosenbaum, \bfnmPaul R\binitsP. R. (\byear1989). \btitleOptimal matching for observational studies. \bjournalJournal of the American Statistical Association \bvolume84 \bpages1024–1032. \endbibitem
  34. {barticle}[author] \bauthor\bsnmRosenbaum, \bfnmPaul R\binitsP. R. and \bauthor\bsnmRubin, \bfnmDonald B\binitsD. B. (\byear1983). \btitleThe central role of the propensity score in observational studies for causal effects. \bjournalBiometrika \bvolume70 \bpages41–55. \endbibitem
  35. {barticle}[author] \bauthor\bsnmRotnitzky, \bfnmAndrea\binitsA., \bauthor\bsnmSmucler, \bfnmEzequiel\binitsE. and \bauthor\bsnmRobins, \bfnmJames M\binitsJ. M. (\byear2021). \btitleCharacterization of parameters with a mixed bias property. \bjournalBiometrika \bvolume108 \bpages231–238. \endbibitem
  36. {barticle}[author] \bauthor\bsnmSchick, \bfnmAnton\binitsA. (\byear1986). \btitleOn Asymptotically Efficient Estimation in Semiparametric Models. \bjournalThe Annals of Statistics \bvolume14 \bpages1139–1151. \endbibitem
  37. {barticle}[author] \bauthor\bsnmSchmidt-Hieber, \bfnmAnselm Johannes\binitsA. J. (\byear2020). \btitleNonparametric regression using deep neural networks with ReLU activation function. \bjournalAnnals of statistics \bvolume48 \bpages1875–1897. \endbibitem
  38. {binproceedings}[author] \bauthor\bsnmSyrgkanis, \bfnmVasilis\binitsV. and \bauthor\bsnmZampetakis, \bfnmManolis\binitsM. (\byear2020). \btitleEstimation and inference with trees and forests in high dimensions. In \bbooktitleConference on learning theory \bpages3453–3454. \bpublisherPMLR. \endbibitem
  39. {barticle}[author] \bauthor\bsnmTao, \bfnmYebin\binitsY. and \bauthor\bsnmFu, \bfnmHaoda\binitsH. (\byear2019). \btitleDoubly robust estimation of the weighted average treatment effect for a target population. \bjournalStatistics in medicine \bvolume38 \bpages315–325. \endbibitem
  40. {bbook}[author] \bauthor\bsnmTsybakov, \bfnmAlexandre B\binitsA. B. (\byear2008). \btitleIntroduction to nonparametric estimation. \bpublisherSpringer Science & Business Media. \endbibitem
  41. {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. (\byear2014). \btitleHigher order tangent spaces and influence functions. \bjournalStatistical science \bvolume29 \bpages679–686. \endbibitem
  42. {barticle}[author] \bauthor\bsnmWager, \bfnmStefan\binitsS. and \bauthor\bsnmAthey, \bfnmSusan\binitsS. (\byear2018). \btitleEstimation and inference of heterogeneous treatment effects using random forests. \bjournalJournal of the American Statistical Association \bvolume113 \bpages1228–1242. \endbibitem
  43. {barticle}[author] \bauthor\bsnmWager, \bfnmStefan\binitsS. and \bauthor\bsnmWalther, \bfnmGuenther\binitsG. (\byear2015). \btitleAdaptive concentration of regression trees, with application to random forests. \bjournalarXiv preprint arXiv:1503.06388. \endbibitem
  44. {barticle}[author] \bauthor\bsnmZou, \bfnmHui\binitsH. and \bauthor\bsnmHastie, \bfnmTrevor\binitsT. (\byear2005). \btitleRegularization and variable selection via the elastic net. \bjournalJournal of the Royal Statistical Society Series B: Statistical Methodology \bvolume67 \bpages301–320. \endbibitem
Citations (1)

Summary

We haven't generated a summary for this paper yet.