A General Identification Algorithm For Data Fusion Problems Under Systematic Selection (2404.06602v2)
Abstract: Causal inference is made challenging by confounding, selection bias, and other complications. A common approach to addressing these difficulties is the inclusion of auxiliary data on the superpopulation of interest. Such data may measure a different set of variables, or be obtained under different experimental conditions than the primary dataset. Analysis based on multiple datasets must carefully account for similarities between datasets, while appropriately accounting for differences. In addition, selection of experimental units into different datasets may be systematic; similar difficulties are encountered in missing data problems. Existing methods for combining datasets either do not consider this issue, or assume simple selection mechanisms. In this paper, we provide a general approach, based on graphical causal models, for causal inference from data on the same superpopulation that is obtained under different experimental conditions. Our framework allows both arbitrary unobserved confounding, and arbitrary selection processes into different experimental regimes in our data. We describe how systematic selection processes may be organized into a hierarchy similar to censoring processes in missing data: selected completely at random (SCAR), selected at random (SAR), and selected not at random (SNAR). In addition, we provide a general identification algorithm for interventional distributions in this setting.
- Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes. arXiv:2006.09676 [econ, stat], June 2020.
- Transportability of Causal Effects: Completeness Results. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), July 2012. ISSN 2374-3468.
- Recovering Causal Effects from Selection Bias. Proceedings of the AAAI Conference on Artificial Intelligence, page 7, 2015.
- Context-Specific Independence in Bayesian Networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996), 1996. 10.48550/arXiv.1302.3562.
- Philip Dawid. Decision-theoretic foundations for statistical causality. Journal of Causal Inference, 9(1):39–77, January 2021. ISSN 2193-3685. 10.1515/jci-2020-0008.
- Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects, April 2022.
- Pearl’s calculus of intervention is complete. In Twenty Second Conference On Uncertainty in Artificial Intelligence, 2006.
- On the completeness of an identifiability algorithm for semi-Markovian models. Annals of Mathematics and Artificial Intelligence, 54(4):363–408, December 2008. ISSN 1012-2443, 1573-7470. 10.1007/s10472-008-9101-x.
- Revisiting the general identifiability problem. In James Cussens and Kun Zhang, editors, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proceedings of Machine Learning Research, pages 1022–1030. PMLR, August 2022.
- Steffen L. Lauritzen. Graphical Models. Oxford, U.K.: Clarendon, 1996.
- Jaron J. R. Lee and Ilya Shpitser. Identification Methods With Arbitrary Interventional Distributions as Inputs. arXiv:2004.01157 [cs, stat], April 2020.
- Causal Effect Identifiability under Partial-Observability. International Conference on Machine Learning, page 10, 2020.
- General Identifiability with Arbitrary Surrogate Experiments. Proceedings of the Conference on Uncertainty in Artificial Intelligence, page 10, 2019.
- General Identifiability with Arbitrary Surrogate Experiments, 2022.
- A potential outcomes calculus for identifying conditional path-specific effects. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019.
- Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan and Kaufmann, San Mateo, 1988.
- Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009. ISBN 978-0521895606.
- Labeled directed acyclic graphs : A generalization of context-specific independence in directed graphical models. Data mining and knowledge discovery, 29(2):503–533, 2015.
- Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. preprint: http://www.csss.washington.edu/Papers/wp128.pdf, 2013.
- Nested Markov properties for acyclic directed mixed graphs. The Annals of Statistics, 51(1):334–361, February 2023. ISSN 0090-5364, 2168-8966. 10.1214/22-AOS2253.
- James M. Robins. A new approach to causal inference in mortality studies with sustained exposure periods – application to control of the healthy worker survivor effect. Mathematical Modeling, 7:1393–1512, 1986.
- Alternative graphical causal models and the identification of direct effects. Causality and Psychopathology: Finding the Determinants of Disorders and their Cures, 2010.
- An interventionist approach to mediation analysis. Journal of Machine Learning Research (to appear), 2023.
- Donald B. Rubin. Inference and Missing Data. Biometrika, 63(3):581–592, 1976. ISSN 0006-3444. 10.2307/2335739.
- Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models. Proceedings of the National Conference on Artificial Intelligence, page 8, 2006.
- On the Identification of Causal Effects. Technical Report R-290-L, UCLA, 2002.
- Equivalence and synthesis of causal models. Technical Report R-150, Department of Computer Science, University of California, Los Angeles, 1990.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.