Learning Mixtures of Unknown Causal Interventions (2411.00213v1)
Abstract: The ability to conduct interventions plays a pivotal role in learning causal relationships among variables, thus facilitating applications across diverse scientific disciplines such as genomics, economics, and machine learning. However, in many instances within these applications, the process of generating interventional data is subject to noise: rather than data being sampled directly from the intended interventional distribution, interventions often yield data sampled from a blend of both intended and unintended interventional distributions. We consider the fundamental challenge of disentangling mixed interventional and observational data within linear Structural Equation Models (SEMs) with Gaussian additive noise without the knowledge of the true causal graph. We demonstrate that conducting interventions, whether do or soft, yields distributions with sufficient diversity and properties conducive to efficiently recovering each component within the mixture. Furthermore, we establish that the sample complexity required to disentangle mixed data inversely correlates with the extent of change induced by an intervention in the equations governing the affected variable values. As a result, the causal graph can be identified up to its interventional Markov Equivalence Class, similar to scenarios where no noise influences the generation of interventional data. We further support our theoretical findings by conducting simulations wherein we perform causal discovery from such mixed data.
- Does teacher training affect pupil learning? evidence from matched comparisons in jerusalem public schools. Journal of Labor Economics, 19(2):343–69, 2001. URL https://EconPapers.repec.org/RePEc:ucp:jlabec:v:19:y:2001:i:2:p:343-69.
- Polynomial learning of distribution families. 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 103–112, 2010. URL https://api.semanticscholar.org/CorpusID:3089712.
- Exact bayesian structure learning from uncertain interventions. In Marina Meila and Xiaotong Shen, editors, Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, volume 2 of Proceedings of Machine Learning Research, pages 107–114, San Juan, Puerto Rico, 21–24 Mar 2007. PMLR. URL https://proceedings.mlr.press/v2/eaton07a.html.
- Markus I. Eronen. Causal discovery and the problem of psychological interventions. New Ideas in Psychology, 59:100785, 2020. ISSN 0732-118X. doi:https://doi.org/10.1016/j.newideapsych.2020.100785. URL https://www.sciencedirect.com/science/article/pii/S0732118X19301436.
- Using bayesian networks to analyze expression data. In Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, RECOMB ’00, page 127–135, New York, NY, USA, 2000. Association for Computing Machinery. ISBN 1581131860. doi:10.1145/332306.332355. URL https://doi.org/10.1145/332306.332355.
- High-frequency off-target mutagenesis induced by crispr-cas nucleases in human cells. Nature biotechnology, 31, 06 2013. doi:10.1038/nbt.2623.
- Learning mixtures of gaussians in high dimensions. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, STOC ’15, page 761–770, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450335362. doi:10.1145/2746539.2746616. URL https://doi.org/10.1145/2746539.2746616.
- Bridging methodologies: Angrist and imbens’ contributions to causal identification. 2024. URL https://api.semanticscholar.org/CorpusID:267760356.
- Identification of mixtures of discrete product distributions in near-optimal sample and time complexity, 2023.
- Causal discovery from soft interventions with unknown targets: Characterization and learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9551–9561. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/6cd9313ed34ef58bad3fdd504355e72c-Paper.pdf.
- Learning and smoothed analysis. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 395–404, 2009. doi:10.1109/FOCS.2009.60.
- A review of causal discovery methods for molecular network analysis. Molecular Genetics and Genomic Medicine, 10(10), October 2022. ISSN 2324-9269. doi:10.1002/mgg3.2055. Funding Information: This work was jointly supported by the British Heart Foundation and The Alan Turing Institute (which receives core funding under the EPSRC grant EP/N510129/1) as part of the Cardiovascular Data Science Awards (Round 2, SP/19/10/34813). Publisher Copyright: © 2022 The Authors. Molecular Genetics and Genomic Medicine published by Wiley Periodicals LLC.
- K. S. Kendler and J. Campbell. Interventionist causal models in psychiatry: repositioning the mind–body problem. Psychological Medicine, 39(6):881–887, 2009. doi:10.1017/S0033291708004467.
- Disentangling mixtures of unknown causal interventions. In Cassio de Campos and Marloes H. Maathuis, editors, Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, volume 161 of Proceedings of Machine Learning Research, pages 2093–2102. PMLR, 27–30 Jul 2021. URL https://proceedings.mlr.press/v161/kumar21a.html.
- Settling the polynomial learnability of mixtures of gaussians. 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 93–102, 2010. URL https://api.semanticscholar.org/CorpusID:3250359.
- Joint causal inference from multiple contexts. J. Mach. Learn. Res., 21(1), jan 2020. ISSN 1532-4435.
- Masashi Okamoto. Distinctness of the Eigenvalues of a Quadratic form in a Multivariate Sample. The Annals of Statistics, 1(4):763 – 765, 1973. doi:10.1214/aos/1176342472. URL https://doi.org/10.1214/aos/1176342472.
- Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Elements of Causal Inference: Foundations and Learning Algorithms. The MIT Press, 2017. ISBN 0262037319.
- A comprehensive review of the placebo effect: recent advances and current thought. Annual review of psychology, 59:565–90, 2008. URL https://api.semanticscholar.org/CorpusID:11316014.
- Causal protein-signaling networks derived from multiparameter single-cell data. Science (New York, N.Y.), 308(5721):523—529, April 2005. ISSN 0036-8075. doi:10.1126/science.1105809. URL https://doi.org/10.1126/science.1105809.
- Causal structure discovery from distributions arising from mixtures of DAGs. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8336–8345. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/saeed20a.html.
- Learning good interventions in causal graphs via covering. In Robin J. Evans and Ilya Shpitser, editors, Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine Learning Research, pages 1827–1836. PMLR, 31 Jul–04 Aug 2023. URL https://proceedings.mlr.press/v216/sawarni23a.html.
- A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(72):2003–2030, 2006. URL http://jmlr.org/papers/v7/shimizu06a.html.
- Permutation-based causal structure learning with unknown intervention targets. In Ryan P. Adams and Vibhav Gogate, editors, Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI 2020, virtual online, August 3-6, 2020, volume 124 of Proceedings of Machine Learning Research, pages 1039–1048. AUAI Press, 2020. URL http://proceedings.mlr.press/v124/squires20a.html.
- Eric V. Strobl. Causal discovery with a mixture of dags. Mach. Learn., 112(11):4201–4225, mar 2022. ISSN 0885-6125. doi:10.1007/s10994-022-06159-y. URL https://doi.org/10.1007/s10994-022-06159-y.
- Learning mixtures of dag models. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, UAI’98, page 504–513, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. ISBN 155860555X.
- Separability analysis for causal discovery in mixture of DAGs. Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URL https://openreview.net/forum?id=ALRWXT1RLZ.
- Unbiased detection of off-target cleavage by crispr-cas9 and talens using integrase-defective lentiviral vectors. Nature biotechnology, 33, 01 2015. doi:10.1038/nbt.3127.
- Permutation-based causal inference algorithms with interventions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 5824–5833, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
- Active learning for optimal intervention design in causal models. Nature Machine Intelligence, 5:1–10, 10 2023. doi:10.1038/s42256-023-00719-0.