Pearl's Causal Hierarchy

Updated 18 November 2025

Pearl’s Causal Hierarchy is a structured framework categorizing causal inference into three levels—observational, interventional, and counterfactual—using Structural Causal Models.
It outlines specific methodologies, such as the abduction-action-prediction sequence and do-calculus, to transition from associations to true causal insights.
The hierarchy emphasizes that observational data alone cannot infer causal effects, thereby necessitating explicit interventions and counterfactual reasoning for robust analysis.

Pearl’s Causal Hierarchy (PCH) is a foundational stratification of causal reasoning that organizes distributions and queries into three distinct levels of inferential power: observational (associational), interventional, and counterfactual. Each level encapsulates an escalating ability to answer causal questions about complex systems, typically represented by Structural Causal Models (SCMs). PCH underpins theoretical and applied work spanning statistics, machine learning, philosophy of science, and empirical disciplines requiring explicit modeling of cause-effect relationships.

1. Formal Structure of Pearl’s Causal Hierarchy

The Causal Hierarchy, as formulated in the SCM framework, partitions causal queries into three rungs:

Level 1: Observational (Associational, $\mathcal{L}_1$ )
- Quantities: Joint, marginal, and conditional probabilities from passive observation, such as $P(Y\,|\,X=x)$ .
- SCM Operation: No intervention; data reflect the structure $P(X_1,\ldots,X_d) = \prod_{i=1}^d P(X_i\,|\,\mathrm{pa}(X_i))$ with $\mathrm{pa}(X_i)$ denoting the parents of $X_i$ in the DAG (Sick et al., 20 Mar 2025, Xia et al., 5 Jan 2024).
Level 2: Interventional (Causal, $\mathcal{L}_2$ )
- Quantities: Interventional distributions, $P(Y\,|\,\do(X=x))$, expressing the effect of “doing” $X := x$ (the do-operator severs incoming edges into $X$ ).
- SCM Operation: Replace the assignment for $X$ with the constant $x$ , modifying the functional relationships to encode the direct effect of the intervention (Sick et al., 20 Mar 2025, Jalaldoust et al., 10 Nov 2025).
Level 3: Counterfactual ( $\mathcal{L}_3$ )
- Quantities: Counterfactuals, e.g., $P(Y_{X=x}=y' \mid X = x', Y = y)$ , reasoning retrospectively about what would have happened under alternate interventions, conditioning on observed realizations.
- SCM Operation: Abduction–Action–Prediction: infer latent noise from observed data (abduction), modify the SCM by intervention (action), and propagate the latent values through the modified model (prediction) (Sick et al., 20 Mar 2025, Xia et al., 5 Jan 2024).

Each level strictly generalizes the previous: Level 2 answers cannot, in general, be deduced from Level 1 information alone, and Level 3 further exceeds Levels 1 and 2 (Causal Hierarchy Theorem) (Jalaldoust et al., 10 Nov 2025, Plecko et al., 4 Nov 2025).

2. SCM Formalism and Query Semantics

A recursive, semi-Markovian SCM is defined by

$\mathcal{M} = (U, P, \{F_i\}_{i=1}^n, X_1,\ldots,X_n)$

where $U$ are exogenous variables with joint distribution $P$ , $X_i$ endogenous variables, and each $F_i$ is a deterministic mapping from parental variables and exogenous input to $X_i$ (Sick et al., 20 Mar 2025, Dörfler et al., 12 May 2024).

Interventions: Action $do(X_k = \alpha)$ replaces $F_k$ with the constant function $\alpha$ , severing all edges into $X_k$ , and all other mechanisms $F_i$ remain unchanged.

Counterfactuals: For observed instance $x$ , infer realized noise $u = F_U^{-1}(F_{X|pa}(x|pa))$ , act as above, and predict by propagating $u$ through the altered SCM (Sick et al., 20 Mar 2025).

3. Implications: The Causal Hierarchy Theorem

The Causal Hierarchy Theorem formalizes the independence of levels:

Level $i$ knowledge does not entail Level $j>i$ quantities for "almost all" SCMs.
Consequence: Observational data ( $P(Y\,|\,X)$ ) cannot identify interventional effects ($P(Y\,|\,\do(X))$) or counterfactuals in the absence of additional causal assumptions (Jalaldoust et al., 10 Nov 2025, Plecko et al., 4 Nov 2025, Xia et al., 5 Jan 2024).
No amount of statistical association suffices for causal effect inference without explicit structural input (e.g., DAG), nor does knowledge of all interventional distributions suffice for counterfactual inference.

4. Language, Satisfiability, and Complexity Landscapes

Causal reasoning in PCH is operationalized by formal probabilistic languages distinguished by both the level of the hierarchy and allowed arithmetic operations:

Atomic, Linear, and Polynomial Terms: $\base$: atomic $P\{\delta\}$ ; $\lin$: sums thereof; $\poly$: sums and products. These may be augmented by a compact marginalization operator $\Sigma$ , yielding succinct encodings (Dörfler et al., 12 May 2024, Bläser et al., 28 Apr 2025).
Satisfiability (SAT) Problem: Given a formula composed of (in)equalities over such terms, is there an SCM at the relevant PCH level that satisfies it?

Complexity (Additive languages with marginalization):

Level	Complexity (Additive, $\Sigma$ )	Complexity (Polynomial, $\Sigma$ )
Observational ( $L_1$ )	NP $^{\rm PP}$ -complete	$\exists\mathbb{R}$ -complete
Interventional ( $L_2$ )	PSPACE-complete	$\exists\mathbb{R}$ -complete
Counterfactual ( $L_3$ )	NEXP-complete	$\exists\mathbb{R}$ -complete

Enriching the language with polynomial operations does not increase complexity at counterfactual and interventional levels, as proven in (Dörfler et al., 12 May 2024).
Adding compact marginalization ( $\Sigma$ ), as in (Bläser et al., 28 Apr 2025), can raise complexity dramatically compared to explicit enumeration.

Fixing the causal DAG ( $G$ ) or restricting to small models alters the complexity class, often increasing complexity due to the additional constraints.

5. Causal Reasoning in Practice: Algorithms and Abstractions

Cluster DAGs:

Enables partial specification of causal knowledge by grouping variables into clusters, specifying only cross-cluster relationships, and abstracting from micro-level details (Anand et al., 2022).
At all PCH levels, soundness and completeness of d-separation, do-calculus, and identification algorithms (ID algorithm) are preserved for cluster-level queries, given compatibility with a micro-level SCM.
Classical do-calculus rules and truncated factorization generalize directly to clusters, allowing rigorous reasoning when only partial structure is known.

Neural and Transformation-Based SCMs:

Transformation models (TRAM-DAGs) allow flexible, interpretable modeling of conditional distributions at all PCH levels, supporting continuous, ordinal, and binary targets, and enabling explicit counterfactual queries (for continuous data) (Sick et al., 20 Mar 2025).
Neural Causal Abstractions employ clustering in representation learning to define high-level SCMs that are PCH-consistent with low-level processes, enabling scalable causal inference and learning of abstract causal relations (Xia et al., 5 Jan 2024).

6. Quantifying and Evaluating Causal Models: Metrics and Empirical Results

Recent work introduces hierarchical metrics to compare causal models across PCH levels:

Observational Distance (OD): Difference in observed distributions.
Interventional Distance (ID): Expectation of OD over all interventions.
Counterfactual Distance (CD): Expectation of ID over all evidences (conditioning) (Peyrard et al., 2020).

These pseudometrics respect the hierarchy: CD=0 $\implies$ ID=0 $\implies$ OD=0. They permit fine-grained benchmarking of causal discovery and inference systems.

Empirical evaluation (e.g., TRAM-DAGs vs. normalizing flows) demonstrates that interpretable, flexible causal models can achieve state-of-the-art performance on all three PCH levels while yielding interpretable mechanisms (Sick et al., 20 Mar 2025).

7. Applications, Limitations, and Ongoing Challenges

Applications:

Image data: Encoding a content–style SCM enables associational, interventional, and counterfactual pixel-level reasoning, though ambiguity, identifiability, and scalability remain substantial challenges (Zečević et al., 2022).
Neuroscience: PCH clarifies the limitations of observational data in resolving mechanistic questions (e.g., spike–wave duality), underscoring the necessity of interventions and structural knowledge (Jalaldoust et al., 10 Nov 2025).
LLMs: Benchmarks show that current LLMs, despite their scale, fail to internalize even basic observational statistics, rendering higher-rung (causal/counterfactual) queries unsupported in the absence of explicit SCMs and structure (Plecko et al., 4 Nov 2025).

Limitations:

Structural knowledge is essential for all nontrivial causal inference; identification is generally impossible from associational data alone.
Computational complexity for reasoning and satisfiability grows strictly across the hierarchy (from NP/PP up to $\exists\mathbb{R}$ - and NEXP-completeness), especially under model constraints (Dörfler et al., 12 May 2024, Bläser et al., 28 Apr 2025).

Ongoing Research:

Developing scalable algorithms that retain interpretability across the hierarchy, new benchmarks for causal inference in complex domains, and tightly characterizing the trade-offs between model flexibility, identifiability, and computational tractability (Anand et al., 2022, Zečević et al., 2022, Xia et al., 5 Jan 2024).

Summary Table: Key Features of PCH Levels

Level	Main Query	Notation	SCM Operation	Required Structure
$L_1$ (Observational)	$P(Y \| X)$	$P(\vec{x})$	Passive sampling	None beyond probabilistic dependencies
$L_2$ (Interventional)	$P(Y \| \do(X))$	$P(Y \| \do(X=x))$	Replace assignment, cut incoming edges	DAG, clear intervention target
$L_3$ (Counterfactual)	$P(Y_{X=x} = y' \| X=x', Y=y)$	$P(Y_{X=x} \| X=x', Y=y)$	Abduction, action, prediction sequence	Full SCM with unique noise assignments

Pearl’s Causal Hierarchy provides the definitive conceptual and mathematical stratification of reasoning under uncertainty, dictating what can and cannot be inferred from data and structure, and remains central to the development, evaluation, and interpretation of causal models in complex domains.