Collaborative causal inference on distributed data (2208.07898v5)
Abstract: In recent years, the development of technologies for causal inference with privacy preservation of distributed data has gained considerable attention. Many existing methods for distributed data focus on resolving the lack of subjects (samples) and can only reduce random errors in estimating treatment effects. In this study, we propose a data collaboration quasi-experiment (DC-QE) that resolves the lack of both subjects and covariates, reducing random errors and biases in the estimation. Our method involves constructing dimensionality-reduced intermediate representations from private data from local parties, sharing intermediate representations instead of private data for privacy preservation, estimating propensity scores from the shared intermediate representations, and finally, estimating the treatment effects from propensity scores. Through numerical experiments on both artificial and real-world data, we confirm that our method leads to better estimation results than individual analyses. While dimensionality reduction loses some information in the private data and causes performance degradation, we observe that sharing intermediate representations with many parties to resolve the lack of subjects and covariates sufficiently improves performance to overcome the degradation caused by dimensionality reduction. Although external validity is not necessarily guaranteed, our results suggest that DC-QE is a promising method. With the widespread use of our method, intermediate representations can be published as open data to help researchers find causalities and accumulate a knowledge base.
- Comparison of different matching methods in observational studies and sensitivity analysis: The relation between depression and stai-2 scores. Expert Systems with Applications, 36(2):1876–1884, 2009.
- P. C. Austin. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research, 46(3):399–424, 2011.
- Moving towards best practice when using inverse probability of treatment weighting (iptw) using the propensity score to estimate causal treatment effects in observational studies. Statistics in medicine, 34(28):3661–3679, 2015.
- Federated learning system without model sharing through integration of dimensional reduced data representations. In Proceedings of IJCAI 2020 International Workshop on Federated Learning for User Privacy and Data Confidentiality, 2020.
- Secureboost: A lossless federated learning framework. IEEE Intelligent Systems, 36(6):87–98, 2021.
- C. Cinelli and C. Hazlett. Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society Series B-Statistical Methodology, 82(1):39–67, 2020.
- R. H. Dehejia and S. Wahba. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American statistical Association, 94(448):1053–1062, 1999.
- Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm. Journal of the American Medical Informatics Association, 27(3):376–385, 2020.
- R. Feldt and A. Magazinius. Validity threats in empirical software engineering research-an initial survey. In Seke, pages 374–379, 2010.
- R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936.
- Causal inference in statistics: A primer. John Wiley & Sons, 2016.
- A survey of learning causality with data: Problems and methods. ACM Computing Surveys (CSUR), 53(4):1–37, 2020.
- J. Gurevitch and L. V. Hedges. Statistical issues in ecological meta-analyses. Ecology, 80(4):1142–1149, 1999.
- Federated adaptive causal estimation (face) of target treatment effects. arXiv preprint arXiv:2112.09313, 2021.
- Privacy-preserving and communication-efficient causal inference for hospital quality measurement. arXiv preprint arXiv:2203.00768, 2022.
- Statistical meta-analysis with applications. John Wiley & Sons, 2011.
- X. He and P. Niyogi. Locality preserving projections. Advances in neural information processing systems, 16, 2003.
- A second chance to get causal inference right: a classification of data science tasks. Chance, 32(1):42–49, 2019.
- P. W. Holland. Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960, 1986.
- Effectiveness of non-benzodiazepine hypnotics in treatment of adult insomnia: meta-analysis of data submitted to the food and drug administration. Bmj, 345, 2012.
- A. Imakura and T. Sakurai. Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, 6(2):04020018, 2020.
- Accuracy and privacy evaluations of collaborative data analysis. In The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-21), 2021a.
- Interpretable collaborative data analysis on distributed data. Expert Systems with Applications, 177:114891, 2021b.
- Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015.
- Federated learning: Strategies for improving communication efficiency. In NIPS Workshop on Private Multi-Party Machine Learning, 2016. URL https://arxiv.org/abs/1610.05492.
- R. J. LaLonde. Evaluating the econometric evaluations of training programs with experimental data. The American economic review, pages 604–620, 1986.
- Fedbcd: A communication-efficient collaborative learning framework for distributed features. IEEE Transactions on Signal Processing, 70:4277–4290, 2022.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
- K. Pearson. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science, 2(11):559–572, 1901.
- The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
- Privacy-protecting estimation of adjusted risk ratios using modified poisson regression in multi-center studies. BMC medical research methodology, 19(1):1–7, 2019.
- Inverse probability weighted cox model in multi-site studies without sharing individual-level data. Statistical methods in medical research, 29(6):1668–1681, 2020.
- Variance estimation in inverse probability weighted cox models. Biometrics, 77(3):1101–1117, 2021.
- Combining distributed regression and propensity scores: a doubly privacy-protecting analytic method for multicenter research. Clinical Epidemiology, 10:1773, 2018.
- Privacy-protecting multivariable-adjusted distributed regression analysis for multi-center pediatric study. Pediatric research, 87(6):1086–1092, 2020.
- Bayesian federated estimation of causal effects from observational data. In The 38th Conference on Uncertainty in Artificial Intelligence, 2022. URL https://openreview.net/forum?id=BEl3vP8sqlc.
- Federated causal inference in heterogeneous observational data. arXiv preprint arXiv:2107.11732, 2021.