High-recall causal discovery for autocorrelated time series with latent confounders (2007.01884v3)

Published 3 Jul 2020 in stat.ME, cs.LG, and stat.ML

Abstract: We present a new method for linear and nonlinear, lagged and contemporaneous constraint-based causal discovery from observational time series in the presence of latent confounders. We show that existing causal discovery methods such as FCI and variants suffer from low recall in the autocorrelated time series case and identify low effect size of conditional independence tests as the main reason. Information-theoretical arguments show that effect size can often be increased if causal parents are included in the conditioning sets. To identify parents early on, we suggest an iterative procedure that utilizes novel orientation rules to determine ancestral relationships already during the edge removal phase. We prove that the method is order-independent, and sound and complete in the oracle case. Extensive simulation studies for different numbers of variables, time lags, sample sizes, and further cases demonstrate that our method indeed achieves much higher recall than existing methods for the case of autocorrelated continuous variables while keeping false positives at the desired level. This performance gain grows with stronger autocorrelation. At https://github.com/jakobrunge/tigramite we provide Python code for all methods involved in the simulation studies.

Citations (88)

View on Semantic Scholar

Summary

The paper introduces a novel iterative approach that improves recall by incorporating causal parents in conditioning sets for autocorrelated time series.
It leverages comprehensive orientation rules to effectively address latent confounders and overcome the limitations of traditional FCI algorithms.
Extensive simulations validate the method’s robustness, demonstrating enhanced true positive rates without inflating false positives even under strong autocorrelation.

High-Recall Causal Discovery for Autocorrelated Time Series with Latent Confounders

The paper presents an advanced method for causal discovery in autocorrelated time series data, addressing the issue of latent confounders. The authors target shortcomings of existing methods like Fast Causal Inference (FCI) algorithms which struggle with low recall in such contexts.

The central contribution is introducing a novel approach that achieves higher recall through an iterative procedure which involves novel orientation rules to determine ancestry during the edge removal phase. This strategy effectively counters the low effect size challenge of conditional independence tests by incorporating causal parents in conditioning sets.

Key Contributions

Improved Recall: The proposed method demonstrates significantly higher recall compared to existing methods, especially as autocorrelation strengthens.
Iterative Procedure: The methodology involves iterating through preliminary and final phases, allowing orientation rules to determine relationships early, thereby improving the statistical power of the tests.
Robustness to Autocorrelation: The technique maintains enhanced true positive rates without inflating false positive rates as autocorrelation increases.
Comprehensive Evaluation: Extensive simulations validate the method's superior performance across varying dimensions, time lags, and sample sizes.

Methodological Insights

Entanglement of Phases: By removing and orienting edges simultaneously, the approach enhances the discovery of ancestral relationships amidst latent confounders.
Effect Size Optimization: Conditioning on known causal parents augments effect size, leading to more robust conditional independence tests.
Order-Independence: The method demonstrates robustness against the order of variables, unlike many existing constraint-based methods.
Algorithm Design: Using the ancestor-parent-rule and iterative algorithm, the approach iterates until all middle marks are resolved, refining causal inference progressively.

Practical Implications

Application in Real-World Scenarios: The method is especially useful where experimentation isn't feasible, such as in climate data analysis or epidemiology.
Open Source Implementation: Practical applications are facilitated by the availability of Python implementations through the tigramite package.

Theoretical and Future Perspectives

Completeness and Soundness: The method is theoretically grounded in proving both completeness and soundness in the absence of selection bias.
Potential Extensions: While promising, extension to non-stationary data and exploring hybrid methods that combine SCM frameworks remains an open field for research.
Discrete and Causally Sufficient Settings: Initial findings suggest varied performance in discrete settings, pointing toward a need for optimized strategies in non-continuous scenarios.

This paper makes significant strides in addressing the pervasive challenge of causal discovery in complex time series, particularly when latent confounders are involved. It opens avenues for enhanced causal analysis across diverse scientific fields, evidencing the potential for continued developments in this area of research.

PDF Markdown

Related Papers

GitHub

GitHub - jakobrunge/tigramite: Tigramite is a python package for causal inference with a focus on time series data. The Tigramite documentation is at (1,326 stars)