Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Simulation-based Benchmarking for Causal Structure Learning in Gene Perturbation Experiments (2407.06015v1)

Published 8 Jul 2024 in stat.ML, cs.LG, and stat.AP

Abstract: Causal structure learning (CSL) refers to the task of learning causal relationships from data. Advances in CSL now allow learning of causal graphs in diverse application domains, which has the potential to facilitate data-driven causal decision-making. Real-world CSL performance depends on a number of $\textit{context-specific}$ factors, including context-specific data distributions and non-linear dependencies, that are important in practical use-cases. However, our understanding of how to assess and select CSL methods in specific contexts remains limited. To address this gap, we present $\textit{CausalRegNet}$, a multiplicative effect structural causal model that allows for generating observational and interventional data incorporating context-specific properties, with a focus on the setting of gene perturbation experiments. Using real-world gene perturbation data, we show that CausalRegNet generates accurate distributions and scales far better than current simulation frameworks. We illustrate the use of CausalRegNet in assessing CSL methods in the context of interventional experiments in biology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Causal structure learning. Annual Review of Statistics and Its Application, 5:371–391, 2018.
  2. Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.
  3. Efficient neural causal discovery without acyclicity constraints. arXiv preprint arXiv:2107.10483, 2021.
  4. Large-scale differentiable causal discovery of factor graphs. Advances in Neural Information Processing Systems, 35:19290–19303, 2022.
  5. Learning to induce causal structure. arXiv preprint arXiv:2204.04875, 2022.
  6. Deep learning of causal structures in high dimensions under data limitations. Nature Machine Intelligence, 5(11):1306–1316, 2023.
  7. Perturb-seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. cell, 167(7):1853–1866, 2016.
  8. Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq. Cell, 185(14):2559–2575, 2022.
  9. Inferring causal molecular networks: empirical assessment through a community-based effort. Nature methods, 13(4):310–318, 2016.
  10. Evaluation of causal structure learning algorithms via risk estimation. In Conference on Uncertainty in Artificial Intelligence, pages 151–160. PMLR, 2020.
  11. Causalbench: A large-scale benchmark for network inference from single-cell perturbation data. arXiv preprint arXiv:2210.17283, 2022.
  12. Sergio: a single-cell expression simulator guided by gene regulatory networks. Cell systems, 11(3):252–271, 2020.
  13. Groundgan: Grn-guided simulation of single-cell rna-seq data using causal generative adversarial networks. Nature Communications, 15(1):4055, 2024.
  14. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
  15. Nonlinear causal discovery with additive noise models. Advances in neural information processing systems, 21, 2008.
  16. Beware of the simulated dag! causal discovery benchmarks may be easy to game. Advances in Neural Information Processing Systems, 34:27772–27784, 2021.
  17. A general and flexible method for signal extraction from single-cell rna-seq data. Nature communications, 9(1):284, 2018.
  18. On wasserstein two-sample testing and related families of nonparametric tests. Entropy, 19(2):47, 2017.
  19. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020.
  20. Structure learning with continuous optimization: A sober look and beyond. In Causal Learning and Reasoning, pages 71–105. PMLR, 2024.
  21. High-dimensional learning of linear causal networks via inverse covariance estimation. The Journal of Machine Learning Research, 15(1):3065–3105, 2014.
  22. Near-optimal multi-perturbation experimental design for causal structure learning. Advances in Neural Information Processing Systems, 34:777–788, 2021.
  23. Interventions, where and how? experimental design for causal models at scale. Advances in Neural Information Processing Systems, 35:24130–24143, 2022.
  24. Differentiable multi-target causal bayesian experimental design. In International Conference on Machine Learning, pages 34263–34279. PMLR, 2023.
  25. Bacadi: Bayesian causal discovery with unknown interventions. In International Conference on Artificial Intelligence and Statistics, pages 1411–1436. PMLR, 2023.
  26. Causation, prediction, and search. MIT press, 2000.
  27. Consistency guarantees for greedy permutation-based causal inference algorithms. Biometrika, 108(4):795–814, 2021.
  28. Permutation-based causal inference algorithms with interventions. Advances in Neural Information Processing Systems, 30, 2017.
  29. Jonathan Kans. Entrez direct: E-utilities on the unix command line. Entrez Programming Utilities Help [Internet], Apr 2013. [Updated 2024 Apr 4].
  30. Pangaea: A modular and extensible collection of tools for mining context dependent gene relationships from the biomedical literature. bioRxiv, pages 2020–04, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Luka Kovačević (3 papers)
  2. Izzy Newsham (2 papers)
  3. Sach Mukherjee (29 papers)
  4. John Whittaker (3 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets