Causal Discovery Algorithms

Updated 13 January 2026

Causal discovery algorithms are computational methods that infer causal structures, typically represented as directed acyclic graphs (DAGs), from observational data.
They employ various paradigms—constraint-based, score-based, hybrid, and even quantum and meta-learning approaches—to address challenges like latent confounding and scalability.
Empirical benchmarks show that modern techniques can enhance sample efficiency, reduce computational cost, and improve the accuracy of inferred causal models.

A causal discovery algorithm is a computational procedure designed to infer aspects of the underlying causal structure—often represented as a directed acyclic graph (DAG) or, more generally, an equivalence class of graphs—from empirical data. The field includes a range of algorithmic paradigms: constraint-based, score-based, hybrid, experimental-design, meta-learning, local, and quantum approaches. Causal discovery is central to empirical sciences for moving beyond associational inference, identifying the generative mechanisms responsible for observed systems, and facilitating robust predictions under interventions.

1. Formal Problem Statement and Assumptions

Causal discovery algorithms typically operate within the structural causal model (SCM) framework, where the data-generating process is modeled as a tuple $(X, U, F, P)$ :

$X = \{X_1,\ldots,X_n\}$ : observed variables;
$U = \{U_1,\ldots,U_m\}$ : latent (exogenous) variables;
$F = \{f_1,\ldots,f_n\}$ : structural assignments $X_i \leftarrow f_i(Pa_i, U_i)$ , $Pa_i$ being the set of parent variables of $X_i$ ;
$P = \bigotimes_i P(U_i)$ : independent noise distributions.

The canonical assumptions are:

Causal sufficiency: all common causes of observed variables are observed;
Acyclicity: the induced causal graph is a DAG (though certain methods relax this);
Faithfulness: statistical conditional independences in $P(X)$ coincide with $d$ -separation in the graph.

The learning objective is to recover the causal DAG $G$ over $X$ or, in the presence of Markov equivalence, the corresponding equivalence class (e.g., CPDAG, PAG) (Huber, 2024, Sauter et al., 2022).

2. Principal Algorithmic Paradigms

Causal discovery algorithms fall into distinct paradigms, summarized below.

Paradigm	Core Idea	Typical Output
Constraint-based	Conditional independence (CI) reasoning (PC, FCI, ICD, CCI)	CPDAG/PAG
Score-based	Graph search optimizing a likelihood + penalty (GES, NOTEARS)	CPDAG
Functional/SEM-based	Leverage functional form/independence (LiNGAM, ANM)	DAG
Hybrid	Combine CI and scoring locally/globally (HLCD, CLIMB)	(C)PDAG/MB
Experimental design	Adaptive queries/interventions (MCD, online design)	DAG
Meta-learning	Learn a policy for structure search/intervention	DAG
Foundation model	Aggregate local/estimated graphs via neural architectures	DAG/CPDAG
Quantum	Use process matrices and quantum operations	DAG/Markov Model

Constraint-based: Algorithms like PC, IC, FCI, ICD, and CCI search for CI relations among observed variables, removing edges and orienting colliders or v-structures to produce a correct equivalence class under Markov and faithfulness assumptions (Huber, 2024, Rohekar et al., 2020, Strobl, 2018).

Score-based: Approaches such as GES or NOTEARS search the space of DAGs to maximize a decomposable score (e.g., BIC), usually regularized for sparsity. These algorithms are consistent under appropriate model classes and penalties but can be computationally intensive (Huber, 2024).

Functional/SEM-based: Methods such as LiNGAM exploit additional model constraints (e.g., linear non-Gaussianity); ANMs use additivity and independence of regression residuals to decide directionality—identifying the causal DAG under strong distributional assumptions (Ni, 2022).

Hybrid and Local: Recent advances combine the skeleton-recognition power of CI-based methods with local/decomposable scoring to promote skeleton precision and orientation (e.g., HLCD) (Ling et al., 2024). Local methods like LDECC exploit orientation rules and eager collider checks to efficiently bound the set of possible ATE values near a treatment node (Gupta et al., 2023). CLIMB leverages algorithmic Markov conditions and description length for directed Markov blanket discovery (Marx et al., 2018).

Meta-learning and RL-based: Emerging methods formulate causal discovery as a reinforcement learning problem. The meta-RL agent is meta-trained across a distribution over SCMs to learn intervention and structure-edit strategies that generalize to new graphs (Sauter et al., 2022).

Experimental Design: Algorithms such as adaptive track-and-stop online methods operate in an interventional setting, actively allocating interventions subject to statistical optimality criteria, minimizing the number of queries required to identify the ground-truth DAG at a specified confidence level (Elahi et al., 2024).

Foundation/Deep-aggregation: Foundation-model style approaches (e.g., SEA) aggregate outputs of classical Causal Discovery run on variable subsets using deep neural architectures (axial attention) to reconstruct the full graph, promoting sample efficiency and scalability (Wu et al., 2024).

Quantum: Dedicated algorithms infer the quantum causal structure (DAGs, Markov models) from process matrices, generalizing classical CI and orientation concepts to quantum operations and signaling constraints (Giarmatzi et al., 2017).

3. Methodological Principles and Structural Guarantees

All sound causal discovery algorithms rely on identifying faithfulness and Markov conditions, conditional independence constraints (possibly relaxed for latent variables or cycles), and appropriate functional or statistical model classes. When limited to observational data:

Recovery is possible only up to a Markov equivalence class (unless intervention or functional-form assumptions hold).
Conditional-independence-based methods output CPDAGs (complete partially directed acyclic graphs) or PAGs (partial ancestral graphs) in the presence of latent confounders.

Theoretical completeness and soundness of these methods is established in the following sense:

Constraint-based: Sound and complete for the output equivalence class under faithfulness and Markov assumptions (Rohekar et al., 2020, Strobl, 2018, Huber, 2024).
Score-based: Consistent for the true DAG/CPDAG under infinite sample, correct score and penalty, and structural identifiability (Huber, 2024).
Hybrid/Local: Under locally-consistent scores and correct test selection, parent/child sets and their orientation around a target are correctly recovered as $n \to \infty$ (Ling et al., 2024, Marx et al., 2018, Gupta et al., 2023).
Interventional/Meta-learning: Meta-trained RL agents efficiently recover the DAG with as few as $O(n H)$ samples and are asymptotically locally optimal (Sauter et al., 2022, Elahi et al., 2024).

4. Empirical Benchmarks and Performance Comparisons

Comprehensive benchmarking relies on realistic simulators (e.g., Neuropathic Pain Diagnosis Simulator (Tu et al., 2019)) or standard Bayesian network datasets (Huber, 2024). Principal performance metrics include:

Structural Hamming Distance (SHD): Number of edge insertions, deletions, or flips separating estimated and ground-truth graphs.
Precision, Recall, F1: Precision/recall tradeoff on edge prediction.
Causal Accuracy: Fraction of node pairs for which the presence/absence and direction of an edge are correctly recovered.

Recent advances yield:

Meta-RL (MCD): Yields dSHD $1.28$ vs. $2.5$ for NOTEARS and $2.94$ for DCDI, with 20–30 samples versus thousands for baselines (Sauter et al., 2022).
Constraint-based (ICD): Reduces CI test count by 3–10× vs. FCI, lowers SHD error, improves statistical power in finite samples (Rohekar et al., 2020).
Recursive wrappers (HCCD): 20–40% fewer CI tests and 8–12% lower SHD vs. baseline PC (Nisimov et al., 2021).
Local (HLCD, LDECC): HLCD achieves the highest F1 (13/14 datasets) and lowest SHD, particularly on small samples (Ling et al., 2024). LDECC matches the ATE bounds of full MEC recovery at a fraction of the computational cost (Gupta et al., 2023).
Foundation/SEA: Attains mAP $\sim0.9$ on synthetic graphs ( $N=20,100$ ), outpacing classical and deep baselines, with two-orders-of-magnitude speedup (Wu et al., 2024).

5. Extensions: Latent Confounding, Selection Bias, Cycles, and Quantum Setting

Advanced algorithms address core obstacles:

Latent Confounders/Selection Bias: FCI recovers PAGs under latent variables; CCI generalizes to cycles and selection bias, outputting partially oriented MAAGs (maximal almost-ancestral graphs) with proven soundness (Strobl, 2018).
Cycles: CCI, via constraint-based reasoning over linear SEM–IE models, allows for feedback by adapting orientation and separation steps. The full MAAG is correct (sound) under faithfulness (Strobl, 2018).
Local/Hybrid Approaches: HLCD distinguishes between v-structures and Markov equivalence classes via local score comparisons, achieving superior performance in high-dimensional or low-sample regimes (Ling et al., 2024).
Quantum Discovery: Quantum causal discovery algorithms recover a minimal DAG compatible with the process matrix using open-output and non-signaling tests, uniquely recovering underlying quantum causal structures in polynomial time (Giarmatzi et al., 2017).

6. Algorithm Design, Computational Aspects, and Best Practices

Key computational considerations:

Complexity: Constraint-based algorithms scale polynomially in the number of variables for fixed maximum degree but exponentially in degree and conditioning set size (Rohekar et al., 2020). Score-based global search is NP-hard ( $O(p^k)$ for max in-degree $k$ ), warranting hybrid and partition-based methods for large $p$ (Shah et al., 2024).
Divide-and-Conquer: Causal graph partitioning leverages superstructures and expansion to enable parallel local learning across overlapping blocks; theoretical guarantees (Theorem 1) ensure full MEC recovery under mild overlap and separation conditions. Empirical results show multi-fold speedups for $p\sim 10^4$ (Shah et al., 2024).
Sample-Efficiency: Meta-RL and online experimental design (track-and-stop) schemes are asymptotically optimal in sample complexity, allocating interventions adaptively until statistical evidence suffices for confident recovery; lower and upper bounds match as $\delta\to 0$ (Sauter et al., 2022, Elahi et al., 2024).
Neural Aggregation: Sample–Estimate–Aggregate models decouple global inference into rapid local estimation (via classical learners on subsets) and neural aggregation, generalizing across novel data regimes and graph sizes (Wu et al., 2024).

Practitioners should match algorithmic choice to data structure, underlying assumptions (e.g., acyclicity, confounding, intervention), sample size, and computational constraints. Hybrid and divide-and-conquer approaches are recommended for high-dimensional settings, while meta-learning and experimental design approaches offer order-of-magnitude gains in sample-limited or interventional regimes.

7. Limitations, Open Problems, and Future Directions

Current limitations include:

Assumption Violations: Most algorithms depend on strong Markov and faithfulness assumptions; real-world data often violate these by faithfulness failure or hidden confounding.
Scalability: Quadratic or worse scaling in variables or conditioning set size remains challenging; partition and hybrid methods partially address but do not eliminate this (Nisimov et al., 2021, Shah et al., 2024).
High-Dimensional Regimes: Robustness to finite samples and dense graphs is not universal; foundation-model and local approaches demonstrate some improvements (Wu et al., 2024, Ling et al., 2024).
Intervention Design: Optimal adaptive intervention schemes are computationally intensive if full DAG enumeration is needed (Elahi et al., 2024).
Generalizability: Extension to nonlinear, mixed, time-series, and quantum domains remains an active area; robust foundations for score-based and functional approaches with non-Gaussianity, cycles, and selection bias are evolving (Strobl, 2018, Giarmatzi et al., 2017).

Ongoing research directions include scalable meta-learning for larger SCMs, foundation-models robust to additional forms of misspecification, divide-and-conquer frameworks for heterogeneous data, optimized adaptive experimentation,