Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery (1702.03877v2)

Published 13 Feb 2017 in stat.ME and stat.ML

Abstract: Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Randomized Conditional Independence Test (RCIT) and the Randomized conditional Correlation Test (RCoT) which both approximate KCIT by utilizing random Fourier features. In practice, both of the proposed tests scale linearly with sample size and return accurate p-values much faster than KCIT in the large sample size context. CCD algorithms run with RCIT or RCoT also return graphs at least as accurate as the same algorithms run with KCIT but with large reductions in run time.

Citations (162)

View on Semantic Scholar

Summary

The paper introduces RCIT and RCoT to approximate KCIT, enabling linear scalability in causal discovery without sacrificing accuracy.
It leverages random Fourier features to drastically reduce cubic computational costs while maintaining robust type I error control and test power.
Empirical evaluations show that these methods integrate effectively with CCD algorithms, significantly reducing processing times and enhancing causal graph accuracy.

Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery

The paper by Strobl, Visweswaran, and Zhang presents innovative solutions to the scalability challenges of conducting conditional independence (CI) tests in non-parametric causal discovery, specifically focusing on constraint-based causal discovery (CCD) algorithms like PC and FCI. These algorithms require efficient CI testing, especially with large datasets. The Kernel Conditional Independence Test (KCIT) serves as a cornerstone tool in non-parametric CI testing but suffers from cubic scaling complexities relative to sample size, making it impractical for large datasets. This paper introduces two alternative methods: the Randomized Conditional Independence Test (RCIT) and the Randomized Conditional Correlation Test (RCoT), which utilize random Fourier features to approximate KCIT while achieving linear scalability.

The proposed methods significantly improve the performance of CCD algorithms without sacrificing the accuracy of returned causal graphs. Both RCIT and RCoT show comparable empirical performance in accuracy of CI testing when contrasted with KCIT, but markedly reduce computation times by several orders of magnitude. By deploying random Fourier features, these methods maintain statistical robustness while overcoming the bottleneck of kernel matrix eigendecomposition and inversion, characteristic of KCIT.

Technical Contributions

The paper delves deeply into the theoretical underpinnings of RCIT and RCoT, providing a rigorous exploration of conditional independence characterization within reproducing kernel Hilbert spaces (RKHSs).

Characterization of CI: The authors employ RKHS properties to define CI through the vanishing of the Hilbert-Schmidt norm of the partial cross-covariance operator. Essentially, RCIT and RCoT approximate this norm's computation using random Fourier features to ensure both CI criteria fulfiLLMent and scalability.
Null Distribution: The asymptotic distribution of the test statistic under the null hypothesis is derived as a sum of positively weighted chi-squared variables. The Lindsay-Pilla-Basak method is employed to finely approximate the null distribution, enhancing the accuracy of p-value calculation.

Empirical Evaluation

The authors conducted extensive empirical evaluations demonstrating the tests' performance across varying sample sizes and conditioning set sizes. RCIT and RCoT were juxtaposed alongside KCIT in terms of type I error rate control, test power (1 - type II error rate), and causal structure discovery accuracy.

Control of Type I Error: RCIT and RCoT exhibited superior calibration under varying sample sizes and conditioning set sizes, outperforming KCIT particularly as dimensionality increases.
Test Power: All three tests showed comparable power at adequately large sample sizes, with RCIT and RCoT maintaining efficient computation even beyond KCIT’s scalability limits.
Causal Discovery Accuracy: Testing across simulated and real datasets highlighted the ability of RCIT and RCoT to match or exceed the accuracy of KCIT when integrated into CCD algorithms, with substantial reductions in processing times.

Practical Implications and Future Directions

The introduction of RCIT and RCoT extends the usability of non-parametric CCD algorithms to larger datasets, offering practical enhancements for researchers and practitioners engaged in causal discovery. Moreover, by decoupling the necessity for cubic time complexity in kernel methods, these tests can support more extensive applications in fields where causal inference from observational data is critical.

Future research could investigate further optimization of random feature sampling, explore alternative kernel approximations, or refine the probabilistic models governing the null distribution estimation. Additionally, expanding the versatility of RCIT and RCoT to integrate into sophisticated causal inference frameworks beyond CCD algorithms merits attention.

In conclusion, the contributions of RCIT and RCoT present substantial progress in non-parametric CI testing, enabling faster and more accurately scalable causal discovery processes. This development promises to democratize access to non-parametric causal discovery within large and complex datasets, broadening the horizons of applicable domains and methodologies in scientific research.