The Hardness of Conditional Independence Testing and the Generalised Covariance Measure (1804.07203v6)

Published 19 Apr 2018 in math.ST and stat.TH

Abstract: It is a common saying that testing for conditional independence, i.e., testing whether whether two random vectors $X$ and $Y$ are independent, given $Z$, is a hard statistical problem if $Z$ is a continuous random variable (or vector). In this paper, we prove that conditional independence is indeed a particularly difficult hypothesis to test for. Valid statistical tests are required to have a size that is smaller than a predefined significance level, and different tests usually have power against a different class of alternatives. We prove that a valid test for conditional independence does not have power against any alternative. Given the non-existence of a uniformly valid conditional independence test, we argue that tests must be designed so their suitability for a particular problem may be judged easily. To address this need, we propose in the case where $X$ and $Y$ are univariate to nonlinearly regress $X$ on $Z$, and $Y$ on $Z$ and then compute a test statistic based on the sample covariance between the residuals, which we call the generalised covariance measure (GCM). We prove that validity of this form of test relies almost entirely on the weak requirement that the regression procedures are able to estimate the conditional means $X$ given $Z$, and $Y$ given $Z$, at a slow rate. We extend the methodology to handle settings where $X$ and $Y$ may be multivariate or even high-dimensional. While our general procedure can be tailored to the setting at hand by combining it with any regression technique, we develop the theoretical guarantees for kernel ridge regression. A simulation study shows that the test based on GCM is competitive with state of the art conditional independence tests. Code is available as the R package GeneralisedCovarianceMeasure on CRAN.

Citations (269)

View on Semantic Scholar

Summary

The paper demonstrates that testing conditional independence is untestable without imposing constraints, establishing a fundamental no-free-lunch theorem.
It introduces a generalized covariance measure that leverages regression techniques to provide asymptotically valid inference in high-dimensional settings.
Simulation studies confirm the method’s robustness by controlling type I error rates and achieving competitive power compared to traditional tests.

An Essay on "The Hardness of Conditional Independence Testing and the Generalised Covariance Measure"

The statistical community has long appreciated the intrinsic challenges associated with testing conditional independence. The paper, "The Hardness of Conditional Independence Testing and the Generalised Covariance Measure," by Rajen D. Shah and Jonas Peters, delves deeply into the complexities, presenting crucial insights and novel methodologies for addressing these challenges.

The paper initially tackles the task of conditional independence testing, which is significantly challenging if one among the variables is continuous. The authors establish a formal theoretical basis for this perceived difficulty by demonstrating that conditional independence is fundamentally untestable without imposing constraints on the null hypothesis. Specifically, under the assumption that the joint distribution of the observed variables is absolutely continuous with respect to Lebesgue measure, they prove that no statistical test can simultaneously control its size and have non-trivial power against any alternative hypothesis. This no-free-lunch theorem implies that test designers must rely on domain knowledge or specific model assumptions to tailor a suitable testing method, unlike the case with unconditional independence testing where permutation tests can adequately control type I error across diverse scenarios.

In light of these challenges, the authors propose a generalized covariance measure (GCM) as a practical solution that capitalizes on regression techniques. The approach involves estimating conditional expectations via regression and computing a test statistic based on the covariance of residuals. Importantly, the method develops the asymptotic properties of the test in substantial detail, particularly for kernel ridge regression with Gaussian assumption—showing that, under appropriate conditions, this setup achieves asymptotically valid levels and good power. The flexibility of the GCM is noted whereby any regression technique that can efficiently handle the specific problem can be utilized, thereby providing a versatile framework applicable to various data scenarios.

The paper impressively extends the applicability of the GCM to multivariate settings, making it robust in practical high-dimensional applications. The authors employ complex simulation studies to demonstrate the GCM's performance—showcasing its relative strengths over competing methods by maintaining correct type I error rates and achieving competitive power.

Notably, the research implications extend beyond theoretical development and methodological innovations. The insights derived from this paper prompt reconsideration of how practitioners specify null models in conditional independence testing, urging the deliberate incorporation of assumptions or constraints that reflect realistic knowledge of the system in question.

Meanwhile, the research also raises questions and possibilities for future exploration: how can the conditional independence tests be dynamically adapted as data evolves? Could machine learning models assist in identifying appropriate model constraints or assumptions for more effective testing? These questions suggest an exciting avenue for the continued development of statistical independence testing.

In conclusion, the contributions of Shah and Peters push the boundary of statistical methodological research, presenting both theoretical rigor and practical utility. As conditional independence testing is foundational to various fields, including causal inference and graphical models, their work provides a significant step forward in bridging theoretical limits with practical competence in high-dimensional statistical analysis.

PDF Markdown

The Hardness of Conditional Independence Testing and the Generalised Covariance Measure (1804.07203v6)

Summary

An Essay on "The Hardness of Conditional Independence Testing and the Generalised Covariance Measure"

Related Papers