Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Membership Testing in Markov Equivalence Classes via Independence Query Oracles (2403.05759v1)

Published 9 Mar 2024 in cs.LG, cs.AI, stat.ME, and stat.ML

Abstract: Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific fields. While extensive research has been dedicated to learning causal graphs from data, its complementary concept of testing causal relationships has remained largely unexplored. While learning involves the task of recovering the Markov equivalence class (MEC) of the underlying causal graph from observational data, the testing counterpart addresses the following critical question: Given a specific MEC and observational data from some causal graph, can we determine if the data-generating causal graph belongs to the given MEC? We explore constraint-based testing methods by establishing bounds on the required number of conditional independence tests. Our bounds are in terms of the size of the maximum undirected clique ($s$) of the given MEC. In the worst case, we show a lower bound of $\exp(\Omega(s))$ independence tests. We then give an algorithm that resolves the task with $\exp(O(s))$ tests, matching our lower bound. Compared to the learning problem, where algorithms often use a number of independence tests that is exponential in the maximum in-degree, this shows that testing is relatively easier. In particular, it requires exponentially less independence tests in graphs featuring high in-degrees and small clique sizes. Additionally, using the DAG associahedron, we provide a geometric interpretation of testing versus learning and discuss how our testing result can aid learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Searching for bayesian network structures in the space of restricted acyclic partially directed graphs. Journal of Artificial Intelligence Research, 18:445–490.
  2. Scaling up the greedy equivalence search algorithm by constraining the search space of equivalence classes. International journal of approximate reasoning, 54(4):429–451.
  3. A characterization of markov equivalence classes for acyclic digraphs. The Annals of Statistics, 25(2):505–541.
  4. Fast scalable and accurate discovery of dags using the best order score search and grow shrink trees. Advances in Neural Information Processing Systems, 36.
  5. Testing random variables for independence and identity. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 442–451. IEEE.
  6. Testing that distributions are close. In Proceedings 41st Annual Symposium on Foundations of Computer Science, pages 259–269. IEEE.
  7. Sublinear algorithms for testing monotone and unimodal distributions. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 381–390.
  8. Sparsityboost: A new scoring function for learning bayesian network structure. arXiv preprint arXiv:1309.6820.
  9. Estimation of kl divergence between large-alphabet distributions. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 1118–1122.
  10. Canonne, C. L. (2020). A survey on distribution testing: Your data is big. but is it blue? Theory of Computing, pages 1–100.
  11. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1193–1203. SIAM.
  12. Chickering, D. M. (1995). A transformational characterization of equivalent bayesian network structures. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 87–98.
  13. Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554.
  14. Large-sample learning of bayesian networks is np-hard. Journal of Machine Learning Research, 5:1287–1330.
  15. Reconstructing Causal Biological Networks through Active Learning. PLoS ONE, 11(3):e0150611.
  16. Active causal structure learning with advice. In International Conference on Machine Learning, pages 5838–5867. PMLR.
  17. Learning sparse causal models is not np-hard. arXiv preprint arXiv:1309.6824.
  18. Learning high-dimensional dags with latent and selection variables. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 850–850.
  19. Dawid, A. P. (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society Series B: Statistical Methodology, 41(1):1–15.
  20. Efficient Structure Learning of Bayesian Networks using Constraints. The Journal of Machine Learning Research, 12:663–689.
  21. Combining gene expression data and prior knowledge for inferring gene regulatory networks via Bayesian networks using structural restrictions. Statistical Applications in Genetics and Molecular Biology, 18(3).
  22. Dirac, G. A. (1961). On rigid circuit graphs. In Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg, volume 25, pages 71–76. Springer.
  23. Interventions and Causal Inference. Philosophy of science, 74(5):981–995.
  24. The relation between the number of species and the number of individuals in a random sample of an animal population. The Journal of Animal Ecology, pages 42–58.
  25. Incorporating expert knowledge when learning Bayesian network structure: A medical case study. Artificial intelligence in medicine, 53(3):181–204.
  26. Using bayesian networks to analyze expression data. Journal of computational biology, 7(3-4):601–620.
  27. Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. The Annals of Statistics, 30(5):1412–1440.
  28. On the logic of causal models. In Machine Intelligence and Pattern Recognition, volume 9, pages 3–14. Elsevier.
  29. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45(4):653–750.
  30. Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40(3-4):237–264.
  31. Hoover, K. D. (1990). The logic of causal inference: Econometrics and the Conditional Analysis of Causation. Economics & Philosophy, 6(2):207–234.
  32. Approximating and testing k-histogram distributions in sub-linear time. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems, pages 15–22.
  33. Minimax estimation of functionals of discrete distributions. IEEE Transactions on Information Theory, 61(5):2835–2885.
  34. Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Research, 8(3).
  35. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427(6971):247–252.
  36. Greedy relaxations of the sparsest permutation algorithm. In Uncertainty in Artificial Intelligence, pages 1052–1062. PMLR.
  37. Lauritzen, S. L. (1996). Graphical models, volume 17. Clarendon Press.
  38. Bayesian Network Structure Learning with Side Constraints. In International conference on probabilistic graphical models, pages 225–236. PMLR.
  39. Causal discovery with language models as imperfect experts. arXiv preprint arXiv:2307.02390.
  40. On the convergence rate of good-turing estimators. In COLT, pages 1–6.
  41. Meek, C. (1995). Causal Inference and Causal Explanation with Background Knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, UAI’95, page 403–410, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  42. Meek, C. (1997). Graphical Models: Selecting causal and statistical models. PhD thesis, Carnegie Mellon University.
  43. Generalized permutohedra from probabilistic graphical models. SIAM Journal on Discrete Mathematics, 32(1):64–93.
  44. High-dimensional consistency in score-based and hybrid structure learning. The Annals of Statistics, 46(6A):3151–3183.
  45. Always good turing: Asymptotically optimal probability estimation. Science, 302(5644):427–431.
  46. Optimal prediction of the number of unseen species. Proceedings of the National Academy of Sciences, 113(47):13283–13288.
  47. Paninski, L. (2008). A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750–4755.
  48. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann.
  49. Pearl, J. (2003). Causality: models, reasoning, and inference. Econometric Theory, 19(4):675–685.
  50. Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3:96.
  51. Structural intervention distance for evaluating causal graphs. Neural computation, 27(3):771–799.
  52. Using genetic data to strengthen causal inference in observational research. Nature Reviews Genetics, 19(9):566–580.
  53. Reichenbach, H. (1956). The Direction of Time, volume 65. University of California Press.
  54. Marginal structural models and causal inference in epidemiology. Epidemiology, pages 550–560.
  55. Learning a Health Knowledge Graph from Electronic Medical Records. Scientific reports, 7(1):1–11.
  56. Robust characterizations of polynomials with applications to program testing. SIAM Journal on Computing, 25(2):252–271.
  57. he TETAD Project: Constraint Based Aids to Causal Model Specification. Multivariate Behavioral Research, 33(1):65–117.
  58. The imap hybrid method for learning gaussian bayes nets. In Advances in Artificial Intelligence: 23rd Canadian Conference on Artificial Intelligence, Canadian AI 2010, Ottawa, Canada, May 31–June 2, 2010. Proceedings 23, pages 123–134. Springer.
  59. Consistency guarantees for greedy permutation-based causal inference algorithms. Biometrika, 108(4):795–814.
  60. Causality from probability.
  61. Causation, prediction, and search. MIT press.
  62. Causal inference in the presence of latent variables and selection bias. arXiv preprint arXiv:1302.4983.
  63. A review of active learning approaches to experimental design for uncovering biological networks. PLoS computational biology, 13(6):e1005466.
  64. Tian, T. (2016). Bayesian Computation Methods for Inferring Regulatory Network Models Using Biomedical Data. Translational Biomedical Informatics: A Precision Medicine Perspective, pages 289–307.
  65. The max-min hill-climbing bayesian network structure learning algorithm. Machine learning, 65:31–78.
  66. Estimating the unseen: An n/log(n)-sample estimator for entropy and support size, shown optimal via new clts. In Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, STOC ’11, pages 685–694, New York, NY, USA. ACM.
  67. An automatic inequality prover and instance optimal identity testing. SIAM Journal on Computing, 46(1):429–455.
  68. Causal inference using llm-guided discovery. arXiv preprint arXiv:2310.15117.
  69. Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, pages 255–270.
  70. Polynomial-time algorithms for counting and sampling markov equivalent dags. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12198–12206.
  71. Woodward, J. (2005). Making Things Happen: A Theory of Causal Explanation. Oxford University Press.
  72. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. ArXiv e-prints, arXiv:1504.01227.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com