Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Simple Unified Approach to Testing High-Dimensional Conditional Independences for Categorical and Ordinal Data (2206.04356v3)

Published 9 Jun 2022 in stat.ML and cs.LG

Abstract: Conditional independence (CI) tests underlie many approaches to model testing and structure learning in causal inference. Most existing CI tests for categorical and ordinal data stratify the sample by the conditioning variables, perform simple independence tests in each stratum, and combine the results. Unfortunately, the statistical power of this approach degrades rapidly as the number of conditioning variables increases. Here we propose a simple unified CI test for ordinal and categorical data that maintains reasonable calibration and power in high dimensions. We show that our test outperforms existing baselines in model testing and structure learning for dense directed graphical models while being comparable for sparse models. Our approach could be attractive for causal model testing because it is easy to implement, can be used with non-parametric or parametric probability models, has the symmetry property, and has reasonable computational requirements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks. In Hunter, J.; Cookson, J.; and Wyatt, J., eds., AIME 89, 247–256. Springer. ISBN 978-3-642-93437-7.
  2. Bergsma, W. P. 2004. Testing conditional independence for continuous random variables. Technical Report 2004-049, EURANDOM.
  3. Nonparametric independence testing via mutual information. Biometrika, 106(3): 547–566.
  4. Adaptive probabilistic networks with hidden variables. Machine Learning, 29(2): 213–244.
  5. Order-Independent Constraint-Based Causal Structure Learning. Journal of Machine Learning Research, 15: 3921–3962.
  6. Cover, T. M. 1999. Elements of information theory. John Wiley & Sons. ISBN 978-0-471-24195-9.
  7. Daudin, J. J. 1980. Partial association measures and an application to qualitative regression. Biometrika, 67(3): 581–590.
  8. Dawid, A. P. 1979. Conditional Independence in Statistical Theory. Journal of the Royal Statistical Society. Series B (Methodological), 41(1): 1–31.
  9. UCI Machine Learning Repository.
  10. Edwards, D. 2012. Introduction to graphical modelling. Springer. ISBN 978-0-387-95054-9.
  11. Measuring Statistical Dependence with Hilbert-Schmidt Norms. In Jain, S.; Simon, H. U.; and Tomita, E., eds., Algorithmic Learning Theory, 63–77. Springer. ISBN 978-3-540-31696-1.
  12. Invariant Causal Prediction for Nonlinear Models. Journal of Causal Inference, 6(2).
  13. Jonckheere, A. R. 1954. A Distribution-Free k-Sample Test Against Ordered Alternatives. Biometrika, 41(1/2): 133–145.
  14. Measures of dependence between random vectors and tests of independence. Literature review. arXiv:1307.7383.
  15. Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software, 47(11): 1–26.
  16. Kohavi, R. 1996. Scaling up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, 202–207. AAAI Press.
  17. Tests of independence among continuous random vectors based on Cramér–von Mises functionals of the empirical copula process. Journal of Multivariate Analysis, 100(6): 1137–1154.
  18. Test of Association Between Two Ordinal Variables While Adjusting for Covariates. Journal of the American Statistical Association, 105(490): 612–620.
  19. A new residual for ordinal outcomes. Biometrika, 99(2): 473–480.
  20. Probability Machines. Methods of Information in Medicine, 51(01): 74–81.
  21. Testing Conditional Independence on Discrete Data using Stochastic Complexity. volume 89, 496–505. PMLR.
  22. Testing Conditional Independence via Quantile Regression Based Partial Copulas. Journal of Machine Learning Research, 22(70): 1–47.
  23. Kernel-based tests for joint independence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(1): 5–31.
  24. Bayesian Networks with Examples in R. Boca Raton: Chapman and Hall. ISBN 978-1-4822-2558-7, 978-1-4822-2560-0.
  25. Goodness-of-fit tests for high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(1): 113–135.
  26. The hardness of conditional independence testing and the generalised covariance measure. Annals of Statistics, 48(3): 1514–1538.
  27. Causation, prediction, and search. MIT press. ISBN 9780262194402.
  28. Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6): 2769–2794.
  29. Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations. International Journal of Epidemiology, 50(2): 620–632.
  30. Local fit evaluation of structural equation models using graphical criteria. Psychological Methods, 23(1): 27–41.
  31. Permutation Testing Improves Bayesian Network Learning. In Machine Learning and Knowledge Discovery in Databases, 322–337. Springer. ISBN 978-3-642-15939-8.
  32. Modern Applied Statistics with S. New York: Springer, fourth edition. ISBN 0-387-95457-0.
  33. Testing conditional independence in supervised learning algorithms. Machine Learning, 110(8): 2107–2129.
  34. Symmetric rank covariances: a generalized framework for nonparametric measures of dependence. Biometrika, 105(3): 547–562.
  35. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1): 1–17.
  36. Yee, T. W. 2022. VGAM: Vector Generalized Linear and Additive Models. R package version 1.1-7.
Citations (4)

Summary

We haven't generated a summary for this paper yet.