Causal Discovery Algorithms Overview

Updated 27 November 2025

Causal discovery algorithms are a suite of methods that uncover causal structures among variables using graphical models, conditional independence tests, and optimization techniques.
These methods are categorized into constraint-based, score-based, and functional causal models, each balancing scalability with specific statistical assumptions.
Recent advancements address challenges like latent confounders, nonlinearity, and high-dimensional data by incorporating differentiable, supervised, and relational approaches.

Causal discovery algorithms encompass a diverse suite of statistical and computational tools designed to recover the underlying causal relationships among variables from observational or interventional data. Central to their methodology is the search for graphical structures—directed acyclic graphs (DAGs), partial ancestral graphs (PAGs), or other generalized graphical forms—whose d-separation properties encode the joint and conditional independence relations present in the data. While early approaches focused on limitations such as acyclicity, causal sufficiency, and strict faithfulness conditions, recent decades have witnessed generalizations to latent confounders, mixtures of mechanisms, nonlinearity, time series, relational domains, high-dimensional settings, and model-misspecification. Algorithmic solutions span constraint-based, score-based, functional-model-based, differentiable, and supervised-learning paradigms, each with unique assumptions, trade-offs, and computational properties.

1. Graphical Foundations and Assumptions

Most causal discovery methods are built on probabilistic graphical models, notably DAGs and their generalizations (MAGs, PAGs) (Huber, 2024, Singh et al., 2017). Under the causal Markov condition, the joint distribution across variables factorizes as a product over conditional distributions governed by the DAG:

$P(X_1,\ldots,X_p) = \prod_{i=1}^p P(X_i \mid X_{\mathrm{Pa}_G(i)}).$

d-Separation provides a graphical criterion for conditional independence: two nodes $A$ and $B$ are d-separated by conditioning set $S$ if every path between them is blocked according to specific collider/non-collider rules. The faithfulness assumption requires that all and only those conditional independencies implied by d-separation appear in the data. Markov equivalence classes are the set of DAGs entailing the same set of d-separations, often represented as CPDAGs (partially oriented graphs capturing invariant arrowheads and ambiguous edges).

Latent variable models (MAGs, PAGs) extend DAG-based semantics to hidden confounders, with PAGs incorporating "circle" marks for unresolved orientations. The problem of mixture distributions—several mechanisms combining over a latent mixing variable—necessitates further graphical generalizations, such as the mixture DAG, union MAG, and algorithms that explicitly encode CI structure of mixture data (Saeed et al., 2020).

2. Taxonomy of Algorithmic Classes

Causal discovery algorithms fall into several categories (Singh et al., 2017, Zheng et al., 2023):

Constraint-Based Methods: These infer structure by conducting a series of conditional independence (CI) tests and applying orientation rules.

PC algorithm: Starts from a fully connected undirected graph, prunes edges where conditional independence is detected, orients v-structures, and propagates edge directions via Meek’s rules.
FCI algorithm: Generalizes PC to allow for latent confounders, orienting colliders and additional marks using a superset of rules, ultimately yielding a PAG.
RFCI/FCI+: Faster variants that reduce complexity, but at the expense of information or restrictions such as sparsity.

Score-Based Methods: These search the space of DAGs or equivalence classes to maximize a decomposable score (e.g. BIC, likelihood).

GES (Greedy Equivalence Search): Employs forward and backward edge addition/deletion on CPDAGs to locally optimize the score.
Exact search (A*, Dynamic Programming): Seeks global optima but at exponential cost.

Functional Causal Model-Based Methods: Rely on functional forms and/or distributional assumptions to identify the true DAG.

LiNGAM: Assumes linear non-Gaussianity, using ICA to recover a unique causal ordering.
ANM, CAM-UV, PNL: Extend to nonlinear/heteroskedastic mechanisms.

Differentiable and Continuous Optimization Methods: Frame the discovery as a parameterized, constrained optimization:

NOTEARS, GOLEM, NoCurl, DAGMA: Formulate learning as optimizing a reconstruction or likelihood loss over weighted adjacency matrices subject to continuous acyclicity constraints, usually via augmented Lagrangian or log-determinant functions (Yi et al., 14 Oct 2025).

Supervised Learning/Neural Aggregators: Train deep models (CNNs, transformer-style axial attention) on synthetic graphs and summary statistics, achieving rapid inference and robustness to data irregularity and sample size (Petersen et al., 2022, Wu et al., 2024).

Relational and Structured Algorithms: Generalize to non-i.i.d. settings, relational schemas, and graphs with entities/relations/attributes (Piras et al., 2 Jul 2025).

Local and Hierarchical Schemes: Target efficient discovery in complex or high-dimensional domains via graph partitioning, hierarchical clustering, or restriction to local neighborhoods around treatment variables (Shah et al., 2024, Nisimov et al., 2021, Gupta et al., 2023).

3. Core Algorithms: Procedures, Complexity, and Assumptions

The following table summarizes paradigmatic methods, their assumptions and computational characteristics:

Algorithm	Key Assumptions	Output Structure	Worst-Case Complexity
PC	i.i.d., Markov, faithfulness, sufficiency	CPDAG	O(p^s_max) CI tests
FCI	i.i.d., Markov, faithfulness, possible latents	PAG	>O(p^s_max), more rules
GES	i.i.d., Markov, faithfulness, sufficiency	CPDAG	O(s p^2), score-based search
LiNGAM	Linear, non-Gaussian, sufficiency	DAG	O(p^3), ICA
NOTEARS/DAGMA	Linear/nonlinear SEM, Markov, acyclicity	Weighted DAGs	Poly time in p per iteration
RelFCI	Relational d-faithfulness	PAAGG/PARM	O(p² 2^d) CI tests
Partition/HCCD	Any base learner; cluster quality	CPDAG/PAG	O(sum_i
SLdisco/SEA	Markov/faithfulness, SEM prior	CPDAG (joint)	O(p²⁾ (forward pass, neural)/pretraining expensive
LDECC	Local faithfulness, collider characterization	Local DAG info	O(n² + n 2^d)

Complexity varies dramatically with conditioning set size, sparsity, and algorithmic partitioning. Constraint-based methods are exponential in degree; score-based optimizations are exponential in p for global search, but clustering and partitioning can render them feasible at scale. Differentiable approaches achieve polynomial-time per optimization pass. Foundational neural methods train offline but conduct joint inference in constant time.

4. Generalizations: Latent Variables, Mixtures, Time Series, Relational Data

Latent Variables and Mixtures: FCI and its variants discover latent confounders by leveraging PAG orientation rules. Mixture distributions across DAGs necessitate union graphs and mixture DAGs—FCI can reliably recover a consistent Markov equivalence class when mixture faithfulness holds and certain ordering assumptions are satisfied (Saeed et al., 2020).

Relational Causal Models: Algorithms such as RelFCI lift constraint-based discovery to abstract ground graphs representing relational dependencies, providing completeness and soundness guarantees in relational d-separation even with latent confounders. These approaches require new graphical constructs (AGG, LAGG, MAAGG, PAAGG/ PARM), orientation rules, and lifted CI tests (Piras et al., 2 Jul 2025).

Time Series Models: VAR-LiNGAM, tsFCI, and TiMINo adapt structure learning to time-lagged dependence, employing sliding-windows, VAR identification, or nonlinear additive noise frameworks. tsFCI extends FCI to time-series via lag-augmentation; VAR-LiNGAM specializes ICA to VAR coefficients (Tang, 2024).

Hierarchical and High-Dimensional Partitioning: Divide-and-conquer approaches use graph partitions derived from a superstructure to localize subgraph learning, enabling scalability to $p \sim 10^4$ variables with theoretical consistency guarantees for MEC recovery (Shah et al., 2024). Hierarchical clustering via normalized cuts followed by recursive learning preserves completeness while reducing statistical (and computational) error (Nisimov et al., 2021).

5. Model Misspecification, Robustness, and Evaluation Metrics

Differentiable causal discovery methods—NOTEARS, DAGMA, GOLEM, and nonlinear extensions—display robustness to wide-ranging violations: latent confounding, measurement error, autoregressive dependence, domain shift, unfaithfulness, MCAR, and mechanism misspecification. The key exception is scale-variation (arbitrary normalization), at which scale-invariant methods (PC/GES) outperform differentiable learners unless adapted (Yi et al., 14 Oct 2025). Robustness is theoretically explained via log-likelihood score minimization where variance ratios are bounded—empirically validated across synthetic and real datasets.

Core structural evaluation metrics include:

Structural Hamming Distance (SHD): Edge edit distance between estimated and ground-truth graphs.
Structural Intervention Distance (SID): Number of intervention pairs ( $i,j$ ) whose predicted and true post-intervention distributions differ.
Predictive and counterfactual accuracy: Performance in downstream tasks on fitted Bayesian networks.

Empirical benchmarks highlight the trade-off between structural accuracy, inferential utility, and computational cost; structural accuracy does not always correlate with prediction or counterfactual quality (Singh et al., 2017).

6. Impact, Limitations, and Open Research Directions

Causal discovery algorithms are essential for high-dimensional scientific problems (genomics, economics, networks), with practical applications in gene regulatory inference, observational epidemiology, social networks, and algorithmic trading (Shah et al., 2024, Tang, 2024). Recent progress enables scalable, robust inference, but limitations persist:

Statistical limit: CI test power decays as conditioning set size grows; sample complexity can become intractable absent sparsity or prior knowledge (Wadhwa et al., 2021).
Identifiability: Faithfulness, causal sufficiency, and correct functional form may not hold in practice, necessitating conservative interpretation or augmented models.
Latent structure: Fully recovering true DAGs in the presence of latent confounders remains challenging; partial identification (equivalence classes, union graphs) is typical.
Scalability: Partitioning, hierarchical wrappers, and neural aggregation methods expand feasible dimensions, but high-dimensional statistical inference retains bottlenecks.
Evaluation: Robust, trustable metrics and simulation studies are needed, particularly in real-world low-data or distribution-shift regimes.

Open questions include the development of scale-invariant differentiable objectives, extension to arbitrary nonlinear and cyclic models, adaptive graph partitioning, full integration of relational and interventional data, and principled uncertainty quantification for discovered structures (Yi et al., 14 Oct 2025, Shah et al., 2024, Piras et al., 2 Jul 2025).

7. Software Tools and Implementations

Modern causal discovery packages provide modular, extensible frameworks supporting major algorithms, independence tests, and evaluation metrics. For example, causal-learn in Python implements constraint-based (PC, FCI, CD-NOD), score-based (GES, exact search), functional-model-based (LiNGAM, ANM, PNL), latent-variable (GIN, RCD, CAM-UV), and time-series (VAR-LiNGAM, Granger) methods, with flexible independence tests (Fisher-z, kernel CI) and utilities for conversion, visualization, and benchmarking (Zheng et al., 2023). Standard open-source datasets and evaluation suites enable reproducible, scalable research.

Causal discovery algorithms are a foundational pillar of modern empirical science and engineering, providing a diverse arsenal of structure-learning methodologies adaptable to domain constraints, dimensionality, and scientific questions. Ongoing innovations in generalization, robustness, and computational scaling continue to enhance their utility, yet critical limitations and challenges remain for statistical inference, model selection, and downstream causal effect identification.