PC Algorithm: Causal DAG Discovery

Updated 18 May 2026

PC Algorithm is a constraint-based method that systematically tests conditional independencies to infer the causal DAG structure.
It recovers a CPDAG by first pruning a complete graph and then orienting edges based on CI tests, ensuring consistency under key assumptions.
Extensions address high-dimensional, non-Gaussian, and time-dependent data with innovations in parallelization, error control, and efficient conditioning.

The PC algorithm is a constraint-based method for learning the structure of a causal directed acyclic graph (DAG) from observational data, grounded in the global Markov property and d-separation. Originating from work by Spirtes and Glymour, the PC algorithm proceeds by performing a series of conditional independence (CI) tests, progressively pruning edges from a complete undirected graph, and subsequently orienting the remaining skeleton to recover as much of the equivalence class of the true DAG as can be deduced from observational data. Under appropriate assumptions (causal sufficiency, acyclicity, faithfulness, and appropriate CI test specification), the PC algorithm yields a completed partially directed acyclic graph (CPDAG) consistent with the Markov equivalence class of the true data-generating DAG. Recent research has explored PC’s statistical consistency under dependent samples, efficient variants for high-dimensional settings, FDR and error control, scalability improvements, and extensions to non-standard graphical models and time series.

1. Formal Specification and Workflow

The PC algorithm is formalized as a two-phase constraint-based estimator for the Markov equivalence class of a (possibly high-dimensional) DAG (Améndola et al., 19 Aug 2025, Zarebavani et al., 2018, Strobl et al., 2016). The procedure assumes as input either i.i.d. samples or, more generally, samples from a stationary process under suitable mixing conditions (Biswas et al., 2022).

Skeleton Discovery:

Initialize with the complete undirected graph on the variable set $V = \{X_{1}, \ldots, X_{p}\}$ .
For conditioning set size $\ell = 0, 1, \dots,$ $ℓ = 0, 1, \dots,$ up to maximum observed degree:
- For each adjacent pair $(i, j)$ , perform CI tests $X_i \indep X_j \mid K$ for all $K \subseteq \mathrm{Adj}(i) \cup \mathrm{Adj}(j) \setminus \{i, j\}$ with $|K| = \ell$ .
- If any $K$ renders $X_i \indep X_j \mid K$, remove edge $i$ – $j$ and record $\ell = 0, 1, \dots,$ 0 as the separating set $\ell = 0, 1, \dots,$ 1.
Proceed to higher $\ell = 0, 1, \dots,$ 2 only while there exists at least one edge with sufficient adjacent nodes.

Edge Orientation:

For every unshielded triple $\ell = 0, 1, \dots,$ $ℓ = 0, 1, \dots,$ 3– $\ell = 0, 1, \dots,$ $ℓ = 0, 1, \dots,$ 4– $\ell = 0, 1, \dots,$ $ℓ = 0, 1, \dots,$ 5 in the skeleton (with $\ell = 0, 1, \dots,$ $ℓ = 0, 1, \dots,$ 6 and $\ell = 0, 1, \dots,$ $ℓ = 0, 1, \dots,$ 7 nonadjacent):
- If $\ell = 0, 1, \dots,$ 8, orient as $\ell = 0, 1, \dots,$ 9 (collider).
- Apply Meek’s rules iteratively to orient compelled edges, enforcing acyclicity and avoidance of new v-structures (Améndola et al., 19 Aug 2025, Zarebavani et al., 2018).

The output is a CPDAG representing the equivalence class of DAGs consistent with the observed CI relations under faithfulness (Améndola et al., 19 Aug 2025).

2. Statistical Foundations and Consistency

Correctness and consistency of PC rest critically on:

Global Markov Property: Any CI implied by d-separation in the true DAG is reflected in the data-generating distribution.
Faithfulness: No CI exists in the true distribution beyond those entailed by d-separation in the DAG.
Causal Sufficiency: All common causes are observed.
Acyclicity: No cycles in the underlying causal structure.

Under these assumptions, if the CI oracle (or a statistically powerful test) reliably detects true (conditional) independencies, the PC algorithm provably recovers the correct skeleton and all compelled edge orientations (Améndola et al., 19 Aug 2025). Consistency extends to data from stationary time series under typical $(i, j)$ 0-mixing assumptions (Biswas et al., 2022); for MLBNs, faithfulness to *-separation or $(i, j)$ 1-separation is required (Améndola et al., 19 Aug 2025). For finite samples, statistical error rates are controlled by suitable multiple-testing corrections or FDR procedures, e.g., using the Benjamini-Yekutieli correction as in PC-p (Strobl et al., 2016).

3. Extensions for Model Robustness and Nonstandard Data

Non-Gaussian and Copula Models:

Rank-based variants (RPC) replace Pearson partial correlations with transformed Spearman/Kendall rank correlations, maintaining high-dimensional consistency in Gaussian copula or nonparanormal settings (Harris et al., 2012). The nonparanormal transform (rank-based Gaussianization) exerts little effect on PC’s performance except in moderately nonlinear, non-Gaussian cases (Ramsey, 2015).

Max-Linear Bayesian Networks:

Max-linear models with heavy-tailed noise violate d-separation faithfulness; the PC algorithm remains consistent if the set of CI tests is based on *-separation or $(i, j)$ 2-separation, which account for extra CI relations created by heavy tails (Améndola et al., 19 Aug 2025). The $(i, j)$ 3-star extension (PC $(i, j)$ 4) is necessary for structure recovery in weighted MLBNs, as standard PC may recover only a reduced skeleton.

Time Series and Dynamic Networks:

Standard PC is not designed for dependent samples. The Time-Aware PC (TPC) algorithm “unrolls” temporal variables, applying PC to an expanded time-lagged graph, then “rolls back” to a time-collapsed directed network. Consistency holds provided the process is stationary and satisfies suitable mixing conditions. Applications include causal functional connectomics from fMRI and neural data, where TPC and Neuro-PC enable consistent inference of time-directed edges in brain or neural networks (Biswas et al., 2022, Biswas et al., 2022, Biswas et al., 2023, Biswas et al., 2020).

4. Algorithmic and Computational Innovations

Parallelization:

Serial PC’s computational complexity is $(i, j)$ 5 where $(i, j)$ 6 is max degree; this becomes intractable for high-dimensional, dense graphs. Several parallel and GPU-based implementations, e.g., parallel-PC and cuPC, demonstrate near-linear speedup with multi-core CPUs and up to 1300 $(i, j)$ 7 acceleration on CUDA GPUs by batch-evaluating CI tests and sharing matrix-inversion results (Zarebavani et al., 2018, Le et al., 2015).

Order-Independence:

PC-Stable orders CI tests purely by conditioning set size, freezing adjacency at each level and thereby eliminating ordering effects seen in the original, sequential PC.

Efficiency Enhancements:

Reduced PC (rPC) restricts maximum conditioning set size to a fixed $(i, j)$ 8 based on properties of random and hub-structured graphs, achieving $(i, j)$ 9 runtime and weaker (“path”) faithfulness requirements (Sondhi et al., 2018). Model-based pre-pruning (P3PC) uses randomized large-set CI tests to pre-eliminate most edges before standard PC, exploiting the robustness of modern penalized regression–based CI tests even in high conditions (Cai et al., 2022).

Statistical Error Control:

PC-p computes edge-specific $X_i \indep X_j \mid K$0-values by logical bounding over multiple CI tests, providing FDR control over the set of inferred edges in the output CPDAG (Strobl et al., 2016).

Hyperparameter Selection:

The significance threshold $X_i \indep X_j \mid K$1 strongly affects performance; AutoPC and Bayesian optimization approaches select $X_i \indep X_j \mid K$2 (and test type) automatically by optimizing an unsupervised stability or reconstruction criterion, yielding superior empirical accuracy and stability over static expert choices (Strobl, 2020, Córdoba et al., 2018).

5. Structural Learning in Challenging Regimes

Robustification and Ambiguity Handling:

PC variants address structural ambiguities due to finite-sample errors via strategies such as maximizing $X_i \indep X_j \mid K$3-values across separating sets (PC-Max), voting or consensus over possible separating sets (CPC/MPC), and Shapley value–based attribution for v-structures (Shapley-PC), which use all available CI information to mitigate the impact of any single erroneous test (Russo et al., 2023, Ramsey, 2016). Shapley-PC leverages the Shapley value to determine which variable in a triple most reliably explains observed conditional independence, improving accuracy in sparse regimes.

Adaptations for Heavy-Tailed Structures:

For MLBNs, $X_i \indep X_j \mid K$4-faithfulness is non-generic due to dominating common-cause paths, requiring new theoretical criteria (*-separation, $X_i \indep X_j \mid K$5-separation) and corresponding algorithmic modifications as in PC$X_i \indep X_j \mid K$6 (Améndola et al., 19 Aug 2025).

Dual PC:

The Dual PC algorithm leverages block-matrix inversion relationships to test conditional independence over both direct and complementary (“dual”) conditioning sets, substantially reducing the total number of necessary CI tests and improving runtime, especially in nearly Gaussian (or Gaussian copula) settings (Giudice et al., 2021).

6. Practical Impact and Empirical Performance

The PC algorithm and its extensions are foundational tools for causal discovery in high-dimensional biological, clinical, neural, and social science domains. Empirical studies confirm that advanced implementations can recover true DAG structure efficiently and with controlled error even in challenging regimes (high-dimensional, mixed data, complex network topology, time series). In comparative benchmarks, PC variants dominate or match alternative constraint-based and score-based approaches in precision–recall, SHD, runtime, and FDR, provided that underlying model assumptions and sufficient sample size are satisfied (Zarebavani et al., 2018, Strobl et al., 2016, Russo et al., 2023).

Modern research emphasizes robust CI testing, automatic tuning, error and ambiguity control, scalability enhancement, and statistically valid structure recovery beyond the i.i.d. and Gaussian limitations of the original algorithm. In heavy-tailed, time-varying, or nonlinear regimes, appropriately adapted PC-type algorithms remain state-of-the-art for nonparametric constraint-based causal structure learning.