Combining Incomplete Observational and Randomized Data for Heterogeneous Treatment Effects (2410.21343v1)
Abstract: Data from observational studies (OSs) is widely available and readily obtainable yet frequently contains confounding biases. On the other hand, data derived from randomized controlled trials (RCTs) helps to reduce these biases; however, it is expensive to gather, resulting in a tiny size of randomized data. For this reason, effectively fusing observational data and randomized data to better estimate heterogeneous treatment effects (HTEs) has gained increasing attention. However, existing methods for integrating observational data with randomized data must require \textit{complete} observational data, meaning that both treated subjects and untreated subjects must be included in OSs. This prerequisite confines the applicability of such methods to very specific situations, given that including all subjects, whether treated or untreated, in observational studies is not consistently achievable. In our paper, we propose a resilient approach to \textbf{C}ombine \textbf{I}ncomplete \textbf{O}bservational data and randomized data for HTE estimation, which we abbreviate as \textbf{CIO}. The CIO is capable of estimating HTEs efficiently regardless of the completeness of the observational data, be it full or partial. Concretely, a confounding bias function is first derived using the pseudo-experimental group from OSs, in conjunction with the pseudo-control group from RCTs, via an effect estimation procedure. This function is subsequently utilized as a corrective residual to rectify the observed outcomes of observational data during the HTE estimation by combining the available observational data and the all randomized data. To validate our approach, we have conducted experiments on a synthetic dataset and two semi-synthetic datasets.
- Jeffrey A. Smith and Petra E. Todd. 2005. Does Matching Overcome LaLonde’s Critique of Nonexperimental Estimators? Journal of Econometrics 125, 1 (March 2005), 305–353. https://doi.org/10.1016/j.jeconom.2004.04.011
- Ahmed Alaa and Mihaela Schaar. 2018. Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 129–138.
- Ahmed M. Alaa and Mihaela van der Schaar. 2017. Bayesian Inference of Individualized Treatment Effects Using Multi-task Gaussian Processes. https://doi.org/10.48550/arXiv.1704.02801 arXiv:1704.02801 [cs]
- Susan Athey. 2017. Beyond Prediction: Using Big Data for Policy Problems. Science 355, 6324 (Feb. 2017), 483–485. https://doi.org/10.1126/science.aal4321
- Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes. https://doi.org/10.48550/arXiv.2006.09676 arXiv:2006.09676 [econ, stat]
- Generalized Random Forests. https://doi.org/10.48550/arXiv.1610.01271 arXiv:1610.01271 [econ, stat]
- Inferring Causal Impact Using Bayesian Structural Time-Series Models. The Annals of Applied Statistics 9, 1 (March 2015). https://doi.org/10.1214/14-AOAS788
- David Cheng and Tianxi Cai. 2021. Adaptive Combination of Randomized and Observational Data. https://doi.org/10.48550/arXiv.2111.15012 arXiv:2111.15012 [stat]
- Causal Inference Methods for Combining Randomized Trials and Observational Studies: A Review. https://doi.org/10.48550/arXiv.2011.08047 arXiv:2011.08047 [stat]
- Irina Degtiar and Sherri Rose. 2023. A Review of Generalizability and Transportability. Annual Review of Statistics and Its Application 10, 1 (March 2023), 501–524. https://doi.org/10.1146/annurev-statistics-042522-103837 arXiv:2102.11904 [stat]
- Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects. https://doi.org/10.48550/arXiv.2201.10743 arXiv:2201.10743 [econ, math, stat]
- Causal Inference in Public Health. Annual Review of Public Health 34, 1 (2013), 61–75. https://doi.org/10.1146/annurev-publhealth-031811-124606
- FAST: A Fused and Accurate Shrinkage Tree for Heterogeneous Treatment Effects Estimation. Thirty-seventh Conference on Neural Information Processing Systems (2023).
- Margaret A. Hamburg and Francis S. Collins. 2010. The Path to Personalized Medicine. The New England Journal of Medicine 363, 4 (July 2010), 301–304. https://doi.org/10.1056/NEJMp1006304
- Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects. arXiv:2202.12891 [cs, stat]
- Jennifer L. Hill. 2011. Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics 20, 1 (Jan. 2011), 217–240. https://doi.org/10.1198/jcgs.2010.08162
- Learning Weighted Representations for Generalization Across Designs. https://doi.org/10.48550/arXiv.1802.08598 arXiv:1802.08598 [stat]
- Learning Representations for Counterfactual Inference. ([n. d.]).
- Removing Hidden Confounding by Experimental Grounding. https://doi.org/10.48550/arXiv.1810.11646 arXiv:1810.11646 [cs, stat]
- Alan B. Krueger. 1999. Experimental Estimates of Education Production Functions. The Quarterly Journal of Economics 114, 2 (1999), 497–532. arXiv:2587015
- Meta-Learners for Estimating Heterogeneous Treatment Effects Using Machine Learning. Proceedings of the National Academy of Sciences 116, 10 (March 2019), 4156–4165. https://doi.org/10.1073/pnas.1804597116 arXiv:1706.03461 [math, stat]
- Deconfounding Temporal Autoencoder: Estimating Treatment Effects over Time Using Noisy Proxies. https://doi.org/10.48550/arXiv.2112.03013 arXiv:2112.03013 [cs, stat]
- Robert J. LaLonde. 1986. Evaluating the Econometric Evaluations of Training Programs with Experimental Data. The American Economic Review 76, 4 (1986), 604–620. arXiv:1806062
- Causal Effect Inference with Deep Latent-Variable Models. https://doi.org/10.48550/arXiv.1705.08821 arXiv:1705.08821 [cs, stat]
- Xinkun Nie and Stefan Wager. 2020. Quasi-Oracle Estimation of Heterogeneous Treatment Effects. https://doi.org/10.48550/arXiv.1712.04912 arXiv:1712.04912 [econ, math, stat]
- Some Methods for Heterogeneous Treatment Effect Estimation in High Dimensions. Statistics in Medicine 37, 11 (May 2018), 1767–1787. https://doi.org/10.1002/sim.7623
- Marginal Structural Models and Causal Inference in Epidemiology:. Epidemiology 11, 5 (Sept. 2000), 550–560. https://doi.org/10.1097/00001648-200009000-00011
- Paul R. Rosenbaum and Donald B. Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 70, 1 (1983), 41–55. https://doi.org/10.2307/2335942 arXiv:2335942
- Combining Observational and Experimental Datasets Using Shrinkage Estimators. https://doi.org/10.48550/arXiv.2002.06708 arXiv:2002.06708 [math, stat]
- Donald B. Rubin. 1974. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology 66, 5 (Oct. 1974), 688–701. https://doi.org/10.1037/h0037350
- Perfect Match: A Simple Method for Learning Representations For Counterfactual Inference With Neural Networks. https://doi.org/10.48550/arXiv.1810.00656 arXiv:1810.00656 [cs, stat]
- Estimating Individual Treatment Effect: Generalization Bounds and Algorithms. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 3076–3085.
- Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding. Advances in Neural Information Processing Systems 35 (2022), 5628–5640.
- Stefan Wager and Susan Athey. 2017. Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. https://doi.org/10.48550/arXiv.1510.04342 arXiv:1510.04342 [math, stat]
- Shu Yang. 2022. Integrative $R$-Learner of Heterogeneous Treatment Effects Combining Experimental and Observational Studies. In Proceedings of the First Conference on Causal Learning and Reasoning. PMLR, 904–926.
- Shu Yang and Peng Ding. 2021. Combining Multiple Observational Data Sources to Estimate Causal Effects. arXiv:1801.00802 [stat]
- Improved Inference for Heterogeneous Treatment Effects Using Real-World Data Subject to Hidden Confounding. https://doi.org/10.48550/arXiv.2007.12922 arXiv:2007.12922 [stat]
- Representation Learning for Treatment Effect Estimation from Observational Data. In Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc.
- Jinsung Yoon and James Jordon. 2018. GANITE: ESTIMATION OF INDIVIDUALIZED TREAT- MENT EFFECTS USING GENERATIVE ADVERSARIAL. (2018).
- Learning Overlapping Representations for the Estimation of Individualized Treatment Effects. https://doi.org/10.48550/arXiv.2001.04754 arXiv:2001.04754 [cs, stat]