Recursive Causal Discovery

Published 14 Mar 2024 in cs.LG and stat.ML | (2403.09300v1)

Abstract: Causal discovery, i.e., learning the causal graph from data, is often the first step toward the identification and estimation of causal effects, a key requirement in numerous scientific domains. Causal discovery is hampered by two main challenges: limited data results in errors in statistical testing and the computational complexity of the learning task is daunting. This paper builds upon and extends four of our prior publications (Mokhtarian et al., 2021; Akbari et al., 2021; Mokhtarian et al., 2022, 2023a). These works introduced the concept of removable variables, which are the only variables that can be removed recursively for the purpose of causal discovery. Presence and identification of removable variables allow recursive approaches for causal discovery, a promising solution that helps to address the aforementioned challenges by reducing the problem size successively. This reduction not only minimizes conditioning sets in each conditional independence (CI) test, leading to fewer errors but also significantly decreases the number of required CI tests. The worst-case performances of these methods nearly match the lower bound. In this paper, we present a unified framework for the proposed algorithms, refined with additional details and enhancements for a coherent presentation. A comprehensive literature review is also included, comparing the computational complexity of our methods with existing approaches, showcasing their state-of-the-art efficiency. Another contribution of this paper is the release of RCD, a Python package that efficiently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com.

Abstract PDF HTML Upgrade to Chat

References (92)

Citations (1)

View on Semantic Scholar

Summary

The paper presents a unified recursive framework that identifies and removes removable variables to significantly cut computational complexity and conditional tests.
It introduces key algorithms such as MARVEL, L-MARVEL, RSL, and ROL to extend causal discovery across scenarios with or without causal sufficiency.
The work establishes theoretical lower bounds for constraint-based methods and offers practical implementation through the RCD Python package.

Recursive Causal Discovery

The paper "Recursive Causal Discovery" addresses the complex challenges associated with learning causal structures from data, a fundamental problem across various scientific domains. The authors focus on overcoming the dual challenges of limited data robustness and high computational complexity inherent in causal graph learning.

Causal discovery involves identifying the causal relationships among variables, typically represented as a causal graph. The problem gains complexity from issues such as errors in statistical testing due to limited data and the daunting computational demands posed by the learning process, especially in high-dimensional settings.

The core approach presented in this paper builds upon the idea of identifying and exploiting the concept of "removable variables." These variables can be removed recursively without affecting the core conditional independencies among the remaining variables, thereby facilitating the learning of the causal structure. The notion of removable variables is pivotal to the recursive methods proposed by the authors, which systematically reduce problem complexity across iterations.

In advancing the recursive causal discovery framework, this research consolidates and extends methodologies from prior work captured in four key publications. A significant contribution includes a unified framework for recursive causal discovery algorithms that addresses errors and reduces computational demands, achieving near-optimal performance by aligning closely with theoretical lower bounds for certain cases.

Among the notable algorithms introduced are:

MARVEL (Markov boundary-based Recursive Variable ELimination): This approach operates under the assumption of causal sufficiency, efficiently learning the causal structure by focusing on variables with small Markov boundaries.
L-MARVEL: Extends MARVEL to scenarios where causal sufficiency does not hold, incorporating robustness for graphs with latent variables.
RSL (Recursive Structure Learning): This includes two algorithms, RSL $_{\omega}$ and RSL $_D$ , which exploit structural side information like bounded clique numbers or diamond-free graphs to achieve polynomial time complexity in certain settings.
ROL (Removable-Order Learning): A method that learns removable orders efficiently, sometimes leveraging reinforcement learning to handle complexities beyond the capacity of traditional methods.

The significance of these recursive methods lies not only in their theoretical soundness but also in their practical applicability, as demonstrated by the introduction of 'RCD'—a Python package that implements these algorithms. This package facilitates adoption by researchers and practitioners and is complemented by comprehensive documentation for ease of use.

The implications of this research extend both practically and theoretically. Practically, the mentioned algorithms reduce the number of conditional independence tests required, thereby enhancing both scalability and reliability under limited data conditions. Theoretically, the authors have contributed by establishing lower bounds for the complexity of constraint-based approaches, offering insights into the inherent difficulty of the causal discovery problem.

Future directions could explore extensions into more general settings without initial reliance on Markov boundary computations, further improving scalability for dealing with high-dimensional biological data like gene regulatory networks. Additionally, parallelization of these methods and theoretical exploration of sample complexity represent promising avenues to enhance the robustness and applicability of recursive causal discovery methods further.