Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Practical Kernel Tests of Conditional Independence (2402.13196v1)

Published 20 Feb 2024 in cs.LG

Abstract: We describe a data-efficient, kernel-based approach to statistical testing of conditional independence. A major challenge of conditional independence testing, absent in tests of unconditional independence, is to obtain the correct test level (the specified upper bound on the rate of false positives), while still attaining competitive test power. Excess false positives arise due to bias in the test statistic, which is obtained using nonparametric kernel ridge regression. We propose three methods for bias control to correct the test level, based on data splitting, auxiliary data, and (where possible) simpler function classes. We show these combined strategies are effective both for synthetic and real-world data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Adaptive test of independence based on hsic measures. The Annals of Statistics, 50(2):858–879, 2022.
  2. Minority neighborhoods pay higher car insurance premiums than white areas with the same risk. ProPublica, April 2017.
  3. The conditional permutation test for independence while controlling for confounders. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1):175–197, 2020.
  4. MMD-FUSE: Learning and combining kernels for two-sample testing without data splitting. In NeurIPS, 2023.
  5. A comparison of efficient approximations for a weighted sum of chi-squared random variables. Statistics and Computing, 26(4):917–928, 2016.
  6. Panning for gold: ‘model-x’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3):551–577, 2018.
  7. A wild bootstrap for degenerate kernel tests. In NeurIPS, 2014.
  8. JJ Daudin. Partial association measures and an application to qualitative regression. Biometrika, 67(3):581–590, 1980.
  9. A general framework for the analysis of kernel-based tests, 2022.
  10. Sobolev norm learning rates for regularized least-squares algorithms. The Journal of Machine Learning Research, 21(1):8464–8501, 2020.
  11. Kernels based tests with non-asymptotic bootstrap approaches for two-sample problems. In Conference on Learning Theory, 2012.
  12. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 5(Jan):73–99, 2004.
  13. Kernel measures of conditional dependence. In NeurIPS, volume 20, 2007.
  14. Arthur Gretton. Introduction to RKHS, and some simple kernel algorithms. Advanced Topics in Machine Learning lecture, University College London, 2013.
  15. Measuring statistical dependence with Hilbert-Schmidt norms. In ALT, pages 63–77, 2005.
  16. A kernel statistical test of independence. In NeurIPS, 2007.
  17. Conditional mean embeddings as regressors. In ICML, 2012.
  18. Kernel partial correlation coefficient — a measure of conditional dependence. J. Mach. Learn. Res., 23(216):1–58, 2022.
  19. Composite goodness-of-fit tests with kernels. arXiv preprint arXiv:2111.10275, 2021.
  20. Differentially private permutation tests: Applications to kernel methods. arXiv preprint arXiv:2310.19043, 2023.
  21. Local permutation tests for conditional independence. The Annals of Statistics, 50(6):3388–3414, 2022.
  22. Dependent wild bootstrap for degenerate U- and V-statistics. Journal of Multivariate Analysis, 117:257–280, 2013.
  23. On the saturation effect of kernel ridge regression. In ICLR, 2022a.
  24. Optimal rates for regularized conditional mean embedding learning. In NeurIPS, 2022b.
  25. Towards optimal Sobolev norm rates for the vector-valued regularized least-squares algorithm. arXiv preprint arXiv:2312.07186, 2023.
  26. Learning deep kernels for non-parametric two-sample tests. In ICML, 2020.
  27. Proximal causal learning with kernels: Two-stage estimation and moment restriction. In ICML, 2021.
  28. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6):1–35, 2021.
  29. Minimax optimal conditional independence testing. The Annals of Statistics, 49(4):2151–2177, 2021.
  30. A measure-theoretic approach to kernel conditional mean embeddings. In NeurIPS, 2020.
  31. Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
  32. Efficient conditionally invariant representation learning. In ICLR, 2022.
  33. Conditional independence testing under misspecified inductive biases. In NeurIPS, 2023.
  34. An asymptotic test for conditional independence using analytic kernel embeddings. In ICML, 2022.
  35. The weighted generalised covariance measure. The Journal of Machine Learning Research, 23(1):12517–12584, 2022.
  36. KSD aggregated goodness-of-fit test. In NeurIPS, 2022a.
  37. Efficient aggregated kernel tests using incomplete U𝑈Uitalic_U-statistics. In NeurIPS, 2022b.
  38. MMD aggregated two-sample test. Journal of Machine Learning Research, 24(194):1–81, 2023.
  39. Model-powered conditional independence test. In NeurIPS, 2017.
  40. The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3):1514–1538, 2020.
  41. Xiaofeng Shao. The dependent wild bootstrap. Journal of the American Statistical Association, 105(489):218–235, 2010.
  42. Supervised feature selection via dependence estimation. In ICML, 2007.
  43. Hilbert space embeddings of conditional distributions. In ICML, 2009.
  44. Causation, Prediction, and Search. Springer, 2nd edition, 2000.
  45. Universality, characteristic kernels and RKHS embedding of measures. JMLR, 12:2389–2410, 2011.
  46. Mercer’s theorem on general domains: On the interaction between measures, kernels, and RKHSs. Constructive Approximation, 35:363–417, 2012.
  47. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference, 7(1):20180017, 2019.
  48. A kernel-based causal learning algorithm. In ICML, 2007.
  49. Nonlinear directed acyclic structure learning with weakly additive noise models. In NeurIPS, 2009.
  50. Chien-Fu Jeff Wu. Jackknife, bootstrap and other resampling methods in regression analysis. The Annals of Statistics, 14(4):1261–1295, 1986.
  51. Kernel-based conditional independence test and application in causal discovery. In UAI, 2011.
Citations (3)

Summary

  • The paper presents SplitKCI, a novel bias reduction method that mitigates inflated false positive rates in kernel-based conditional independence tests.
  • It employs data splitting, auxiliary data, and simpler regression functions to reduce bias, leading to superior Type I error control and enhanced test power.
  • Empirical results demonstrate SplitKCI's robustness in high-dimensional and unbalanced scenarios, making it a promising tool for causal discovery and complex data analysis.

Strategies for Enhancing Kernel-Based Conditional Independence Tests

Introduction

In the landscape of statistical testing, Conditional Independence (CI) testing is pivotal for deciphering the interdependencies of random variables in the presence of potential confounders. This work scrutinizes the kernel-based approach to CI testing, highlighting its data efficiency and utility across various applications, from basic scientific inquiry to the evaluation of machine learning methods and causal discovery.

Kernel-Based Conditional Independence Tests

Kernel-based CI tests, particularly Kernel-based Conditional Independence (KCI) and Conditional Independence Regression CovariancE (CIRCE), leverage kernel analogues in place of linear regression residuals and encompass a kernel-based alternative to covariance. These methods possess the theoretical capability to detect any form of conditional dependence. However, their practical application is hampered in low-data regimes due to bias in nonparametric regression used for conditional feature means estimation, threatening test validity through inflated false positive rates.

Bias Reduction Strategies

This paper introduces three bias correction techniques - data splitting, the use of auxiliary data, and employment of simpler function classes for regression - culminating in the proposal of a modified KCI/CIRCE method named SplitKCI. By deploying data splitting in the computation of conditional mean embeddings (CME) and differing feature space dimensions for the two regressions involved, SplitKCI markedly reduces test statistic bias without compromising test consistency. Illustratively, improvements in Type I error control and test power under several synthetic data settings underscore the efficacy of these bias mitigation strategies.

Empirical Evaluation

Empirical investigations reveal SplitKCI’s superior Type I error control compared to conventional KCI in both balanced and unbalanced data scenarios. Its robustness is further demonstrated across tasks simulating varying degrees of dependence complexity and confounding, with notable performance in high-dimensional contexts often encountered in causal discovery applications. Moreover, the flexibility in kernel choice for SplitKCI — enabling the use of non-universal kernels without sacrificing asymptotic guarantees — permits adaptation to structured data peculiarities, enhancing the practical applicability of kernel-based CI tests.

Theoretical Implications and Applications

The research posits SplitKCI as a promising solution to the bias challenge inherent in kernel-based CI testing, particularly beneficial when auxiliary information is accessible or when prior knowledge necessitates specific kernel functions. The methodological advancements presented here bear implications for developing more reliable tools for CI testing in complex datasets, which is cornerstone in elucidating causal relationships and ensuring fairness in algorithmic predictions.

Future Directions in Kernel-Based CI Testing

Further exploratory avenues include the integration of kernel-based methods with strategies to counteract potential data sparsity and high dimensionality, and the expansion of kernel choice paradigms to accommodate diverse data characteristics. In addition, addressing interpretability concerns related to conditional (in)dependence assertions — especially critical in domains with significant societal impact — remains an essential frontier in refining kernel-based statistical tests for widespread adoption.

Conclusion

This work not only fortifies the theoretical underpinnings of kernel-based CI testing but also enhances its practical viability through innovative bias correction techniques, exemplified by SplitKCI. By offering a more accurate and flexible testing framework, the contributions made pave the way for nuanced inference of conditional independence across varied scientific and technological fields.

Github Logo Streamline Icon: https://streamlinehq.com