Causal Graph Construction Methods

Updated 29 May 2026

Causal graph construction is the process of inferring directed acyclic graphs (DAGs) that capture direct cause-effect relationships among variables.
Methods utilize constraint-based, score-based, and hybrid algorithms to test conditional independence and optimize penalized likelihoods for structure recovery.
Recent advances employ scalable neural models, LLM-driven prompting, and expert-guided co-design to enhance causal inference in high-dimensional and complex domains.

Causal graph construction is the process of inferring a directed graph that encodes the direct cause-effect relationships among a set of variables, with the goal of supporting estimation of intervention effects, structural reasoning, and robust decision-making. The underlying object is typically a directed acyclic graph (DAG) representing a structural equation model (SEM), though extensions encompass mixed graphs (with bidirected edges for unmeasured confounding), partially directed acyclic graphs (PDAGs), and causal knowledge graphs for unstructured domains. Research in this area spans methodologies using constraint-based tests, Bayesian and greedy searches, LLMs, graph neural networks, expert-guided co-design, and integration with database technologies.

1. Formal Problem Settings and Graphical Models

Let $X = \{X_1, ..., X_n\}$ denote $n$ random variables, and let $G = (V, E)$ be the unknown causal graph on $V=X$ with $E \subset \{(X_i \to X_j): i \neq j\}$ , typically required to be a directed acyclic graph (DAG). The aim is to recover the edge set $E$ —that is, to answer for each ordered pair $(i,j)$ whether a direct causal link $X_i \to X_j$ exists—on the basis of a dataset $D = \{x^{(k)}\}_{k=1}^N$ of observational or interventional samples, and/or other sources such as metadata or domain expertise.

Alternative structural assumptions accommodate unmeasured confounding by introducing mixed graphs (directed and bidirected edges) as in bow-free acyclic path diagrams (BAPs) and permit the possibility of non-Gaussian errors, latent variables, or heterogeneous data types (continuous and discrete) (Wang et al., 2020, Sedgewick et al., 2017). In distributed or knowledge-graph contexts, nodes may correspond to event concepts, structural mechanisms, or hypernodes grounded in property graphs (Pachera et al., 2024, Hassanzadeh, 2024, Lu et al., 2013).

The objective is to produce a graph estimator $\hat G = (V, \hat E)$ that closely approximates the true underlying structure $n$ 0, measured according to metrics such as precision, recall, F1-score, Hamming loss, normalized structural Hamming distance (SHD), true positive rate (TPR), or decision-theoretic utility (Jiralerspong et al., 2024, Rashid et al., 2022, Gonzalez-Soto et al., 2020).

2. Constraint-Based, Score-Based, and Hybrid Algorithms

Traditional approaches predominantly fall into three classes:

Constraint-based methods: These rely on conditional independence tests (e.g., the PC and CPC algorithms), leveraging the Markov property that a variable is independent of non-descendants given its parents. Conditional-independence testing is executed recursively, typically requiring $n$ 1 tests in the worst case for $n$ 2 variables, but recent advances utilize hybrid schemes to restrict the skeleton search space (Dash et al., 2013, Sedgewick et al., 2017).
Score-based search: Here, algorithms search over the space of possible DAGs (or their equivalence classes, essential graphs/CPDAGs) to maximize a penalized likelihood or marginal likelihood score, such as Bayesian Dirichlet equivalent uniform (BDeu) scores. DAG-search is NP-hard, so practical systems employ greedy hill-climbing, random restarts, or continuous score relaxations (e.g., NOTEARS) (Dash et al., 2013).
Hybrid anytime algorithms: These interleave the above, employing constraint-based heuristics to rapidly generate candidate equivalence classes, followed by Bayesian or likelihood-based scoring and greedy refinement within the Markov equivalence class. An example is the EGS algorithm, which alternates runs of PC-search under randomized thresholds with structure scoring, leading to better performance in the sparse or small-data regime (Dash et al., 2013).

For mixed-data domains, graphical model construction often proceeds by first fitting an undirected mixed graphical model (MGM) via penalized pseudolikelihood, producing a sparse skeleton, followed by a constraint-based causal search (e.g., PC or CPC) restricted to the magnesium-defined super-structure. This hybrid reduces computational cost and improves direction recovery, while supporting continuous/categorical variables (Sedgewick et al., 2017).

3. Advances in Scalability, Neural, and LLM-Based Methods

Scalability to high-dimensional settings is achieved through divide-and-conquer frameworks, neural architectures, and LLM-powered prompting:

Causal graph partitioning decomposes the global search into local learning on overlapping variable blocks, using a "superstructure" (undirected cover $n$ 3 containing the true DAG $n$ 4) to define partitions, then applies consistent PAG estimators (e.g., FCI) locally and merges by screening for consensus adjacents and collider orientation. Under stated conditions, this process is theoretically consistent (i.e., reconstructs the Markov equivalence class in the infinite data limit), while affording substantial acceleration on graphs with $n$ 5 nodes (Shah et al., 2024).
NN-based methods (e.g., CSIvA) treat the mapping from observational/interventional data matrices to adjacency matrices as a supervised end-to-end problem, often modeled by transformer architectures that alternate attention over variables and samples. These models are trained on synthetic graphs and interventions, achieving strong performance in both in-distribution and OOD settings, with substantially faster inference than optimization-based baselines (Ke et al., 2022).
LLM-based causal graph construction exploits the LLM's knowledge base to infer plausible causal directions based on variable names and descriptions. A naive approach makes $n$ 6 pairwise queries; however, recent work proposes a breadth-first search (BFS) prompting protocol, reducing the number of LLM calls to $n$ 7 by exploiting the DAG property and querying "children" of each node iteratively. Observational data, if available, can be incorporated as summary statistics within the prompts or via Bayesian post-processing, further improving accuracy (Jiralerspong et al., 2024).

Table: Comparative Query Complexity in LLM Causal Graph Construction

Method	LLM Queries	Scalability	Typical F1 (Asia, n=8)
Pairwise	$n$ 8	Unscalable ( $n$ 9 infeasible)	0.50
BFS-based	$G = (V, E)$ 0	Hundreds of nodes	0.93

Based on results from (Jiralerspong et al., 2024).

4. Integration with Domain Knowledge, Expert Input, and Knowledge Graphs

Incorporation of expert knowledge, semantic information, and structured metadata is a hallmark of several advanced approaches:

Expert-guided iterative co-design employs an iterative loop where domain experts score the plausibility of candidate edges, incorporate weighted FDR corrections for multiple testing on various hypothesis families (edge strengths, noise assumptions, covariance fit), and update the graph based on FDCR-adjusted p-values. This allows for a transparent integration of human prior beliefs with statistical evidence, with convergence criteria based on graph stability or expert satisfaction (Kling, 2024).
Causal mechanism-based construction defines structural knowledge as locally specified equations, organized into a mechanism knowledge base, supporting interactive model-building. The approach uses structure matrices and Simon's causal ordering, extended to mixed graphs (with undirected/relevance and bidirected/feedback arcs) and provides live graphical feedback via systems such as ImaGeNIe within GeNIe/SMILE (Lu et al., 2013).
Causal knowledge graphs and graph database integration generalize the causal graph notion to ontologies and property graphs. Notable here is the extension of the property graph model to support hypernodes (abstracting subgraphs as variables), causal edges, structural equations tied to nodes, and explicit support for interventional and conditional probability distributions as first-class properties. Declarative operators (e.g., EXTRACT, PROBABILITY, DO-CALCULUS) and incremental view maintenance allow for real-time alignment of causality with underlying data and cross-source integration (Pachera et al., 2024).
Human-in-the-loop and AR interfaces facilitate interactive graph creation, especially in domains where variables and relationships are not pre-enumerated. Operators select, label, and connect candidate variables using augmented reality, perform on-the-fly interventions, and iteratively refine causal graphs, with current limitations including absence of score- or constraint-based automated discovery (Tram et al., 2024).

5. Evaluation Metrics, Empirical Results, and Practical Considerations

Performance of causal graph construction algorithms is assessed using metrics such as:

Precision, recall, F1-score: Fraction of correctly identified edges (with direction) among all predicted (precision) and among all true links (recall).
Structural Hamming Distance (SHD) or Normalized Hamming Distance Ratio (NHD Ratio): Minimum number of edge insertions/deletions/reorientations to match the true graph, possibly normalized by the number of nodes.
True Positive/False Positive Rates: Especially useful for sparse graphs.
Decision-theoretic utility: Expected value of interventions computed from the learned model (Gonzalez-Soto et al., 2020).

Empirical results demonstrate that BFS-based LLM discovery achieves F1 scores up to 0.93 on standard clinical benchmarks (Asia, $G = (V, E)$ 1), whereas pairwise protocols are substantially less efficient (F1=0.50, high cost). On larger graphs (e.g., 221 nodes), only scalable protocols remain tractable. In high-dimensional gene regulatory settings ( $G = (V, E)$ 2), divide-and-conquer partitioning provides a $G = (V, E)$ 3 speedup, with comparable accuracy to single-block methods (Jiralerspong et al., 2024, Shah et al., 2024). For mixed data types, hybrid MGM/PC methods outperform pure directed search in both speed and adjacency/directness recovery (Sedgewick et al., 2017).

Expert-guided FDCR co-design allows principled control of Type I error rate across multiple edge-addition/removal decisions, with iterative graph updates guided by p-value-adjusted feedback (Kling, 2024).

6. Extensions: GNNs, Counterfactuals, and Causal Explanation

Recent methods extend causal graph construction to the context of graph neural networks (GNNs), explaining predictions by reconstructing causal subgraphs via neural SCMs or perturbation. Notably, CNL-GNN learns edge masks and node embeddings invariant to spurious connections by simulating counterfactual neighborhoods and attention-based causal scoring, enabling robust node classification and generalized OOD inference (Job et al., 20 Feb 2026). CXGNN constructs SCMs with neural parameterizations for subgraphs, enabling explicit cause–effect calculation and selection of maximally causal subgraphs that explain GNN predictions (Behnam et al., 2024).

Automatic and semi-automatic construction of causal knowledge graphs is active in NLP. WikiCausal exemplifies pipeline-based extraction of event-event causal relations from Wikipedia articles (QA-based, CauseNet-based), with systematic LLM-based precision and recall evaluation against Wikidata (Hassanzadeh, 2024).

7. Future Directions and Open Challenges

Current challenges include improving robustness to sample size and domain shift, achieving formal consistency guarantees in LLM-driven and neural approaches, integrating metadata and intervention protocols for hybrid discovery, and aligning causal DAG discovery with representation-learning paradigms. Research on query-efficient algorithms, probabilistic and Bayesian model selection under massive hypothesis spaces, and automated maintenance/integration in database contexts remains critical.

Potential extensions identified include advanced LLM prompting such as tree-of-thought or beam search to explore subgraphs, leveraging LLM-generated priors in combination with statistical structure learning (PC, GES, NOTEARS), adaptation to interventional data and ontologies in specific domains, and refined explainability frameworks for graph models under distribution shifts (Jiralerspong et al., 2024, Pachera et al., 2024, Job et al., 20 Feb 2026).

Overall, causal graph construction continues to evolve as a multi-faceted research area, drawing on statistical, algorithmic, neural, and human–computer interaction foundations to enable interpretable, scalable, and domain-aware causal inference.