Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

C-Learner: Constrained Learning for Causal Inference and Semiparametric Statistics (2405.09493v3)

Published 15 May 2024 in stat.ML and cs.LG

Abstract: Popular debiased causal estimation methods, e.g. for the average treatment effect -- such as one-step estimation (e.g., augmented inverse propensity weighting) and targeted maximum likelihood estimation -- enjoy desirable asymptotic properties such as statistical efficiency and double robustness. However, they often produce unstable estimates when there is limited overlap between treatment and control, and require ad hoc adjustments in practice (e.g., truncating propensity scores). In contrast, simple plug-in estimators are stable but lack good asymptotic properties. We propose a novel debiased estimator that achieves the best of both worlds, producing stable plug-in estimates with desirable asymptotic properties. Our constrained learning framework solves for the best plug-in estimator under the constraint that the first-order error with respect to the plugged-in quantity is zero, and can leverage flexible model classes including neural networks and tree ensembles. In several experimental settings, including ones in which we handle text-based covariates by fine-tuning LLMs, our constrained learning-based estimator outperforms one-step estimation and targeting in challenging settings with limited overlap between treatment and control, and performs comparably otherwise.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. S. Athey and G. Imbens. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27):7353–7360, 2016.
  2. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(4):597–623, 2018.
  3. Generalized random forests. 2019.
  4. Efficient and Adaptive Estimation for Semiparametric Models. Springer Verlag, 1998.
  5. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
  6. Effects of early intervention on cognitive function of low birth weight preterm infants. The Journal of pediatrics, 120(3):350–359, 1992.
  7. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika, 96(3):723–734, 2009.
  8. Assessing treatment effect variation in observational studies: Results from a data challenge. arXiv:1907.07592 [stat.ME], 2019.
  9. T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4232-2. doi: 10.1145/2939672.2939785. URL http://doi.acm.org/10.1145/2939672.2939785.
  10. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 2018.
  11. Automatic debiased machine learning via riesz regression. arXiv preprint arXiv:2104.14737, 2021.
  12. Riesznet and forestriesz: Automatic debiased machine learning with neural nets and random forests. In International Conference on Machine Learning, pages 3901–3914. PMLR, 2022.
  13. Jigsaw unintended bias in toxicity classification, 2019. URL https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.
  14. Moving the goalposts: Addressing limited overlap in the estimation of average treatment effects by changing the estimand. Technical report, National Bureau of Economic Research, 2006.
  15. P. Ding. A first course in causal inference, 2023.
  16. L. T. Fernholz. Von Mises calculus for statistical functionals, volume 19. Springer Science & Business Media, 2012.
  17. J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  18. Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects, 2019.
  19. M. Hernan and J. Robins. Causal Inference. Chapman & Hall/CRC Monographs on Statistics & Applied Probab. CRC Press, 2023. ISBN 9781420076165. URL https://books.google.com/books?id=_KnHIAAACAAJ.
  20. J. L. Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 2011.
  21. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71(4):1161–1189, 2003.
  22. G. Imbens and D. Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press, 2015.
  23. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical Science, 22(4):523 – 539, 2007.
  24. E. H. Kennedy. Semiparametric doubly robust targeted double machine learning: a review. arXiv:2203.06469 [stat.ME], 2022.
  25. Addressing Extreme Propensity Scores via the Overlap Weights. American Journal of Epidemiology, 188(1):250–257, 09 2018. ISSN 0002-9262. doi: 10.1093/aje/kwy201. URL https://doi.org/10.1093/aje/kwy201.
  26. W. K. Newey. Semiparametric efficiency bounds. Journal of applied econometrics, 5(2):99–135, 1990.
  27. W. K. Newey. The asymptotic variance of semiparametric estimators. Econometrica, pages 1349–1382, 1994.
  28. J. Nocedal and S. J. Wright. Numerical optimization. Springer, 1999.
  29. Orthogonal random forest for causal inference. In International Conference on Machine Learning, pages 4932–4941. PMLR, 2019.
  30. Comment: Performance of double-robust estimators when” inverse probability” weights are highly variable. Statistical Science, 22(4):544–559, 2007.
  31. J. M. Robins. Robust estimation in sequentially ignorable missing data and causal inference models. In Proceedings of the American Statistical Association, volume 1999, pages 6–10. Indianapolis, IN, 2000.
  32. J. M. Robins and N. Wang. Inference for imputation estimators. Biometrika, 87(1):113–124, 2000.
  33. Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866, 1994.
  34. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the american statistical association, 90(429):106–121, 1995.
  35. C. A. Rolling and Y. Yang. Model Selection for Estimating Treatment Effects. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(4):749–769, 11 2013. ISSN 1369-7412. doi: 10.1111/rssb.12043. URL https://doi.org/10.1111/rssb.12043.
  36. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108, 2019. URL http://arxiv.org/abs/1910.01108.
  37. The right tool for the job: choosing between covariate-balancing and generalized boosted model propensity scores. Epidemiology, 28(6):802–811, 2017.
  38. Adapting neural networks for the estimation of treatment effects, 2019.
  39. Empirical processes. In Weak Convergence and Empirical Processes: With Applications to Statistics, pages 127–384. Springer, 2023.
  40. Targeted learning: causal inference for observational and experimental data, volume 10. Springer, 2011.
  41. Targeted maximum likelihood learning. The international journal of biostatistics, 2(1), 2006.
  42. Cross-validated targeted minimum-loss-based estimation. Targeted learning: causal inference for observational and experimental data, pages 459–474, 2011.
  43. A. W. Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  44. S. Wager and S. Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
Citations (1)

Summary

We haven't generated a summary for this paper yet.