dotears: Scalable, consistent DAG estimation using observational and interventional data (2305.19215v2)
Abstract: New biological assays like Perturb-seq link highly parallel CRISPR interventions to a high-dimensional transcriptomic readout, providing insight into gene regulatory networks. Causal gene regulatory networks can be represented by directed acyclic graph (DAGs), but learning DAGs from observational data is complicated by lack of identifiability and a combinatorial solution space. Score-based structure learning improves practical scalability of inferring DAGs. Previous score-based methods are sensitive to error variance structure; on the other hand, estimation of error variance is difficult without prior knowledge of structure. Accordingly, we present $\texttt{dotears}$ [doo-tairs], a continuous optimization framework which leverages observational and interventional data to infer a single causal structure, assuming a linear Structural Equation Model (SEM). $\texttt{dotears}$ exploits structural consequences of hard interventions to give a marginal estimate of exogenous error structure, bypassing the circular estimation problem. We show that $\texttt{dotears}$ is a provably consistent estimator of the true DAG under mild assumptions. $\texttt{dotears}$ outperforms other methods in varied simulations, and in real data infers edges that validate with higher precision and recall than state-of-the-art methods through differential expression tests and high-confidence protein-protein interactions.
- GTEx Consortium. The gtex consortium atlas of genetic regulatory effects across human tissues. Science, 369(6509):1318–1330, 2020.
- Large-scale cis-and trans-eqtl analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nature genetics, 53(9):1300–1310, 2021.
- Perturb-seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. cell, 167(7):1853–1866, 2016.
- Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science, 365(6455):786–793, 2019.
- Mapping information-rich genotype-phenotype landscapes with genome-scale perturb-seq. Cell, 185(14):2559–2575, 2022.
- Exploring genetic interactions and networks with yeast. Nature Reviews Genetics, 8(6):437–449, 2007.
- The genetic landscape of a cell. science, 327(5964):425–431, 2010.
- A global genetic interaction network maps a wiring diagram of cellular function. Science, 353(6306):aaf1420, 2016.
- On equivalence of causal models. In Proceedings of the Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-90), pages 220–227, 1990.
- Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs. The Journal of Machine Learning Research, 13(1):2409–2464, 2012.
- Causal structure learning: a combinatorial perspective. Foundations of Computational Mathematics, pages 1–35, 2022.
- Dags with no tears: Continuous optimization for structure learning. Advances in neural information processing systems, 31, 2018.
- Beware of the simulated dag! causal discovery benchmarks may be easy to game. Advances in Neural Information Processing Systems, 34:27772–27784, 2021.
- High-dimensional learning of linear causal networks via inverse covariance estimation. The Journal of Machine Learning Research, 15(1):3065–3105, 2014.
- On the role of sparsity and dag constraints for learning linear dags. Advances in Neural Information Processing Systems, 33:17943–17954, 2020.
- Interventions and causal inference. Philosophy of Science, 74(5):981–995, 2007.
- Judea Pearl. Causality. Cambridge university press, 2009.
- Unsuitability of notears for causal graph discovery when dealing with dimensional quantities. Neural Processing Letters, 54(3):1587–1595, 2022.
- Characterizing and learning equivalence classes of causal dags under interventions. In International Conference on Machine Learning, pages 5541–5550. PMLR, 2018.
- Permutation-based causal inference algorithms with interventions. Advances in Neural Information Processing Systems, 30, 2017.
- Permutation-based causal structure learning with unknown intervention targets. In Conference on Uncertainty in Artificial Intelligence, pages 1039–1048. PMLR, 2020.
- Differentiable causal discovery from interventional data. Advances in Neural Information Processing Systems, 33:21865–21877, 2020.
- Directlingam: A direct method for learning a linear non-gaussian structural equation model. Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011.
- On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.
- Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
- Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome biology, 15(12):1–21, 2014.
- glmgampoi: fitting gamma-poisson generalized linear models on single cell count data. Bioinformatics, 36(24):5701–5702, 2020.
- The string database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research, 51(D1):D638–D646, 2023.
- Direct estimation of differences in causal graphs. Advances in neural information processing systems, 31, 2018.
- Snakemake—a scalable bioinformatics workflow engine. Bioinformatics, 28(19):2520–2522, 2012.
- Cam: Causal additive models, high-dimensional order search and penalized regression. 2014.
- David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002.