Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Fixed-Point Approach for Causal Generative Modeling (2404.06969v3)

Published 10 Apr 2024 in cs.LG and stat.ML

Abstract: We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables, eliminating the need for Directed Acyclic Graphs (DAGs), and establish the weakest known conditions for their unique recovery given the topological ordering (TO). Based on this, we design a two-stage causal generative model that first infers in a zero-shot manner a valid TO from observations, and then learns the generative SCM on the ordered variables. To infer TOs, we propose to amortize the learning of TOs on synthetically generated datasets by sequentially predicting the leaves of graphs seen during training. To learn SCMs, we design a transformer-based architecture that exploits a new attention mechanism enabling the modeling of causal structures, and show that this parameterization is consistent with our formalism. Finally, we conduct an extensive evaluation of each method individually, and show that when combined, our model outperforms various baselines on generated out-of-distribution problems. The code is available on \href{https://github.com/microsoft/causica/tree/main/research_experiments/fip}{Github}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
  2. Estimating the long-term effects of novel treatments. arXiv preprint arXiv:2103.08390, 2021.
  3. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. In International Conference on Learning Representations, 2019.
  4. Dowhy-gcm: An extension of dowhy for causal inference in graphical causal models. arXiv preprint arXiv:2206.06821, 2022.
  5. Foundations of structural causal models with cycles and latent variables. The Annals of Statistics, 49(5):2885–2915, 2021.
  6. Differentiable dag sampling. arXiv preprint arXiv:2203.08509, 2022.
  7. Chickering, D. M. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002.
  8. Large-sample learning of bayesian networks is np-hard. Journal of Machine Learning Research, 5:1287–1330, 2004.
  9. Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  10. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359, 2022.
  11. On random graphs i. Publ. math. debrecen, 6(290-297):18, 1959.
  12. Falcon, W. A. Pytorch lightning. GitHub, 3, 2019.
  13. Precision-recall-gain curves: Pr analysis done right. Advances in neural information processing systems, 28, 2015.
  14. Deep end-to-end causal inference. arXiv preprint arXiv:2202.02195, 2022.
  15. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
  16. Huang, B. Diagnosis of autism spectrum disorder by causal influence strength learned from resting-state fmri data. In Neural Engineering Techniques for Autism Spectrum Disorder, pp.  237–267. Elsevier, 2021.
  17. Causal normalizing flows: from theory to practice. arXiv preprint arXiv:2306.05415, 2023.
  18. Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Research, 8(3), 2007.
  19. Ocdaf: Ordered causal discovery with autoregressive flows. arXiv preprint arXiv:2308.07480, 2023.
  20. Learning to induce causal structure. arXiv preprint arXiv:2204.04875, 2022.
  21. Causal autoregressive flows. In International Conference on Artificial Intelligence and Statistics, pp.  3520–3528. PMLR, 2021.
  22. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 29:4743–4751, 2016.
  23. Causalgan: Learning causal implicit generative models with adversarial training. arXiv preprint arXiv:1709.02023, 2017.
  24. Gradient-based neural dag learning. arXiv preprint arXiv:1906.02226, 2019.
  25. Ring attention with blockwise transformers for near-infinite context. arXiv preprint arXiv:2310.01889, 2023.
  26. Amortized inference for causal structure learning. Advances in Neural Information Processing Systems, 35:13104–13118, 2022.
  27. Efficient approximations for the marginal likelihood of bayesian networks with hidden variables. Machine learning, 29:181–212, 1997.
  28. On the role of sparsity and dag constraints for learning linear dags. arXiv preprint arXiv:2006.10201, 2020.
  29. Masked autoregressive flow for density estimation. Advances in neural information processing systems, 30, 2017.
  30. Automatic differentiation in pytorch. 2017.
  31. Deep structural causal models for tractable counterfactual inference. Advances in neural information processing systems, 33:857–869, 2020.
  32. Pearl, J. Causality. Cambridge university press, 2009.
  33. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
  34. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
  35. Rosenblatt, M. Remarks on a multivariate transformation. The annals of mathematical statistics, 23(3):470–472, 1952.
  36. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529, 2005.
  37. Vaca: designing variational graph autoencoders for causal queries. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8159–8168, 2022.
  38. Simpson, E. H. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological), 13(2):238–241, 1951.
  39. Syntren: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC bioinformatics, 7(1):1–12, 2006.
  40. An application of bayesian network for predicting object-oriented software maintainability. Information and Software Technology, 48(1):59–67, 2006.
  41. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  42. Villani, C. Optimal transport: old and new, volume 338. Springer, 2009.
  43. D’ya like dags? a survey on structure learning and causal discovery. ACM Computing Surveys, 55(4):1–36, 2022.
  44. Collective dynamics of ‘small-world’networks. nature, 393(6684):440–442, 1998.
  45. The causal-neural connection: Expressiveness, learnability, and inference. Advances in Neural Information Processing Systems, 34:10823–10836, 2021.
  46. Dag-gnn: Dag structure learning with graph neural networks. In International Conference on Machine Learning, pp. 7154–7163. PMLR, 2019.
  47. Are transformers universal approximators of sequence-to-sequence functions? arXiv preprint arXiv:1912.10077, 2019.
  48. Extensions of ica for causality discovery in the hong kong stock market. In International Conference on Neural Information Processing, pp.  400–409. Springer, 2006.
  49. gcastle: A python toolbox for causal discovery. arXiv preprint arXiv:2111.15155, 2021.
  50. Dags with no tears: Continuous optimization for structure learning. arXiv preprint arXiv:1803.01422, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Meyer Scetbon (22 papers)
  2. Joel Jennings (15 papers)
  3. Agrin Hilmkil (12 papers)
  4. Cheng Zhang (388 papers)
  5. Chao Ma (187 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com