Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal Modeling with Stationary Diffusions (2310.17405v2)

Published 26 Oct 2023 in cs.LG

Abstract: We develop a novel approach towards causal inference. Rather than structural equations over a causal graph, we learn stochastic differential equations (SDEs) whose stationary densities model a system's behavior under interventions. These stationary diffusion models do not require the formalism of causal graphs, let alone the common assumption of acyclicity. We show that in several cases, they generalize to unseen interventions on their variables, often better than classical approaches. Our inference method is based on a new theoretical result that expresses a stationarity condition on the diffusion's generator in a reproducing kernel Hilbert space. The resulting kernel deviation from stationarity (KDS) is an objective function of independent interest.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Can we believe the DAGs? A comment on the relationship between causal DAGs and mechanisms. Statistical methods in medical research, 25(5):2294–2314.
  2. Sobolev spaces. Elsevier, 2nd edition.
  3. Operator methods for continuous-time Markov processes. Handbook of financial econometrics: tools and techniques, pages 1–66.
  4. Feedback systems: an introduction for scientists and engineers. Princeton university press.
  5. Barbour, A. D. (1988). Stein’s method and Poisson process convergence. Journal of Applied Probability, 25(A):175–184.
  6. Neural graphical modelling in continuous-time: consistency guarantees and algorithms. In International Conference on Learning Representations.
  7. Beyond structural causal models: Causal constraints models. In Uncertainty in Artificial Intelligence, pages 585–594. PMLR.
  8. Causal modeling of dynamical systems. arXiv preprint arXiv:1803.08784.
  9. Foundations of structural causal models with cycles and latent variables. The Annals of Statistics, 49(5):2885–2915.
  10. JAX: composable transformations of Python+NumPy programs.
  11. Differentiable causal discovery from interventional data. Advances in Neural Information Processing Systems, 33:21865–21877.
  12. Chickering, D. M. (2003). Optimal structure identification with greedy search. J. Mach. Learn. Res., 3:507–554.
  13. Universal kernels on non-standard input spaces. Advances in neural information processing systems, 23.
  14. Short-term interest rates as subordinated diffusions. The Review of Financial Studies, 10(3):525–577.
  15. Acceleration of global warming due to carbon-cycle feedbacks in a coupled climate model. Nature, 408(6809):184–187.
  16. Optimal transport tools (ott): A jax toolbox for all things wasserstein. arXiv preprint arXiv:2201.12324.
  17. Dawid, A. P. (2010). Beware of the DAG! In Causality: objectives and assessment, pages 59–86. PMLR.
  18. Identifiability in continuous Lyapunov models. arXiv preprint arXiv:2209.03835.
  19. SERGIO: a single-cell expression simulator guided by gene regulatory networks. Cell systems, 11(3):252–271.
  20. Estimation of continuous-time Markov processes sampled at random time intervals. Econometrica, 72(6):1773–1808.
  21. Interventions and causal inference. Philosophy of science, 74(5):981–995.
  22. Markov processes: characterization and convergence. John Wiley & Sons.
  23. Learning Gaussian networks. In Uncertainty Proceedings 1994, pages 235–243. Elsevier.
  24. Learning generative models with sinkhorn divergences. In International Conference on Artificial Intelligence and Statistics, pages 1608–1617. PMLR.
  25. Measuring sample quality with Stein’s method. Advances in neural information processing systems, 28.
  26. Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3):424–438.
  27. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773.
  28. Back to the future: Generating moment implications for continuous-time Markov processes. Econometrica, 63(4):767–804.
  29. Causal interpretation of stochastic differential equations. Electronic Journal of Probability, 19:1–24.
  30. Engineered gene circuits. Nature, 420(6912):224–230.
  31. Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. The Journal of Machine Learning Research, 13(1):2409–2464.
  32. Quantifying causal emergence shows that macro can beat micro. Proceedings of the National Academy of Sciences, 110(49):19790–19795.
  33. Learning linear cyclic causal models with latent variables. The Journal of Machine Learning Research, 13(1):3387–3439.
  34. Hyvärinen, A. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4).
  35. Estimation of a structural vector autoregression model using non-Gaussianity. Journal of Machine Learning Research, 11(5).
  36. On the identifiability and estimation of causal location-scale noise models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 14316–14332. PMLR.
  37. Jacobsen, M. (1993). A brief account of the theory of homogeneous Gaussian diffusions in finite dimensions. Frontiers in Pure and Applied Probability 1.
  38. Gaussian processes and kernel methods: A review on connections and equivalences. arXiv preprint arXiv:1807.02582.
  39. Khasminskii, R. (2011). Stochastic stability of differential equations, volume 66. Springer Science & Business Media.
  40. Learning stable deep dynamics models. Advances in neural information processing systems, 32.
  41. Discovering cyclic causal models by independent components analysis. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI’08, page 366–374, Arlington, Virginia, USA. AUAI Press.
  42. A kernelized Stein discrepancy for goodness-of-fit tests. In International conference on machine learning, pages 276–284. PMLR.
  43. Ljung, L. (1998). System identification. Springer.
  44. Universal kernels. Journal of Machine Learning Research, 7(12).
  45. Cyclic causal discovery from continuous equilibrium data. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13, page 431–439, Arlington, Virginia, USA. AUAI Press.
  46. On causal discovery with cyclic additive noise models. Advances in neural information processing systems, 24.
  47. From ordinary differential equations to structural causal models: The deterministic case. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI’13, page 440–448, Arlington, Virginia, USA. AUAI Press.
  48. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science, 365(6455):786–793.
  49. Øksendal, B. (2003). Stochastic differential equations. Springer.
  50. DYNOTEARS: Structure learning from time-series data. In International Conference on Artificial Intelligence and Statistics, pages 1595–1605. PMLR.
  51. Pearl, J. (2009). Causality. Cambridge university press.
  52. Causal models for dynamical systems. In Probabilistic and Causal Inference: The Works of Judea Pearl, pages 671–690. Association for Computing Machinery.
  53. Elements of causal inference: foundations and learning algorithms. The MIT Press.
  54. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607.
  55. Gaussian processes for machine learning, volume 1. Springer.
  56. The Lyapunov neural network: Adaptive stability certification for safe learning of dynamical systems. In Conference on Robot Learning, pages 466–476. PMLR.
  57. Richardson, T. (1996). A polynomial-time algorithm for deciding Markov equivalence of directed cyclic graphical models. In Proceedings of the Twelfth International Conference on Uncertainty in Artificial Intelligence, UAI’96, page 462–469, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  58. backShift: Learning causal cyclic graphs from unknown shift interventions. Advances in Neural Information Processing Systems, 28.
  59. Causal consistency of structural equation models. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), page ID 11.
  60. Applied stochastic differential equations, volume 10. Cambridge University Press.
  61. Schölkopf, B. (2022). Causality for machine learning. In Probabilistic and Causal Inference: The Works of Judea Pearl, pages 765–804. Association for Computing Machinery.
  62. Learning with kernels. MIT press.
  63. NODAGS-Flow: Nonlinear cyclic causal structure learning. In International Conference on Artificial Intelligence and Statistics, pages 6371–6387. PMLR.
  64. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(10).
  65. A Hilbert space embedding for distributions. In Algorithmic Learning Theory: 18th International Conference, ALT 2007, Sendai, Japan, October 1-4, 2007. Proceedings 18, pages 13–31. Springer.
  66. Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory, volume 6, pages 583–603. University of California Press.
  67. Stein, M. L. (1999). Interpolation of spatial data: some theory for kriging. Springer Science & Business Media.
  68. Support vector machines. Springer Science & Business Media.
  69. Neural Granger causality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):4267–4279.
  70. Causal machine learning for single-cell genomics. arXiv preprint arXiv:2310.14935.
  71. Graphical continuous Lyapunov models. In Conference on Uncertainty in Artificial Intelligence, pages 989–998. PMLR.
  72. Permutation-based causal inference algorithms with interventions. Advances in Neural Information Processing Systems, 30.
  73. Wendland, H. (2004). Scattered data approximation, volume 17. Cambridge university press.
  74. Wong, E. (1964). The construction of a class of stationary Markoff processes. In Proc. Sympos. Appl. Math., Vol. XVI, pages 264–276.
  75. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67.
  76. Active learning for optimal intervention design in causal models. arXiv preprint arXiv:2209.04744.
  77. Matching a desired causal state via shift interventions. Advances in Neural Information Processing Systems, 34:19923–19934.
  78. DAGs with NO TEARS: Continuous optimization for structure learning. Advances in neural information processing systems, 31.
Citations (4)

Summary

We haven't generated a summary for this paper yet.