Causal Hierarchy Theorem

Updated 12 November 2025

The Causal Hierarchy Theorem is a framework that categorizes causal queries into observational, interventional, and counterfactual levels, each with strict non-reducibility.
It underpins logical, topological, and algorithmic foundations, showing that lower-level data cannot universally determine higher-order causal effects.
Applications span from mathematical physics in Lorentzian length spaces to challenges in data-driven machine learning, highlighting the need for substantive model-based assumptions.

The Causal Hierarchy Theorem encapsulates the principled stratification of causal questions and structures, formalizing the inherent separation between different categories of probabilistic and causal queries. Across mathematical physics and machine learning, this theorem articulates, with rigor, what can and cannot be inferred at each level of the causal ladder—spanning from observational to interventional and counterfactual domains, and extending to global causality structures on Lorentzian length spaces. The theorem’s variants underpin the logical, topological, and syntactic foundations of contemporary causal reasoning and are pivotal in both theoretical and applied frameworks.

1. Formal Definitions of the Causal Ladder

The causal hierarchy is delineated by stratified levels of probabilistic queries over a system—most canonically via Structural Causal Models (SCMs) and, in spacetime physics, via causality conditions for Lorentzian length spaces.

In the SCM Framework

Level 1 (Association/Observation): Queries solely about the observational joint, $P(V_1, ..., V_n)$ , reflecting mere correlations.
Level 2 (Intervention): Queries concerning $P(V \mid \mathrm{do}(X = x))$ , corresponding to the distribution under explicit (hypothetical) interventions (the " $do$ "-operator).
Level 3 (Counterfactual): Queries such as $P(V_x \mid V' = v')$ or fully fledged $P(A_{B=b, C=c} | D_{E=e})$ , combining actual outcomes with hypothetical manipulations in the same exogenous realization.

Causal Hierarchy Theorem (SCM):

Every SCM $\mathcal{M}$ naturally defines these three levels.
Strict separation: In general, queries at higher levels are not reducible to those at lower levels. Formally, there does not exist a universal function $f$ such that $P(V) \Rightarrow P(V|\mathrm{do}(X))$ or $P(V|\mathrm{do}(X)) \Rightarrow P(V_x|V'=v')$ (Zečević et al., 2022).

In Lorentzian Length Spaces

Chronological and Causal Relations: For $(X,d,\ll,\le,\tau)$ , the chronological future $I^+(x) = \{y \mid x \ll y\}$ and causal future $J^+(x) = \{y \mid x \le y\}$ .
Key conditions:
- Global Hyperbolicity (GH)
- Causal Simplicity (CS)
- Causal Continuity (CC)
- Stable Causality (SC)
- Strong Causality (Strong)

Hierarchy (Physical Causality Ladder):

$\mathrm{GH} \implies \mathrm{CS} \implies \mathrm{CC} \implies \mathrm{SC} \implies \mathrm{Strong}$ , each level strictly above classical “causal” and “chronological” conditions (Hau et al., 2020).

2. Hierarchy Separation: Expressivity and Non-Identifiability

The theorem asserts strict inclusions at every level, establishing non-identifiability results that prohibit the universal determination of higher-level causal or counterfactual quantities solely from lower-level information.

Separation Propositions: There exist SCMs (e.g., one with $Y \leftarrow X$ , one with $X \leftarrow Y$ ) that agree on all observational ( $L_1$ ) but not intervention ( $L_2$ ) queries; similarly, models can agree on interventions ( $L_2$ ) but disagree on counterfactual queries ( $L_3$ ) such as $\mathbb{P}([X=0]Y=0 \wedge [X=1]Y=1)$ (Ibeling et al., 2020, Zečević et al., 2022).
Physical Causality Implications: In Lorentzian length spaces, each stricter causality property guarantees, but is not implied by, those below it. For example, stable causality ensures that no time-travel paradoxes exist, but does not guarantee the topological properties of the spacetime necessary for global hyperbolicity (Hau et al., 2020).

Table: Hierarchical Relations in SCMs

Level	Query Type	Example Query
$L_1$	Observational	$P(Y\|X=x)$
$L_2$	Interventional	$P(Y\|\mathrm{do}(X=x))$
$L_3$	Counterfactual	$P(Y_x = y \wedge Y_{x'} = y')$

3. Topological and Measure-Theoretic Foundations

A topological refinement of the hierarchy reveals that the set of cases where lower-level information determines higher-level quantities is not just measure zero, but meager—topologically “small.”

Topological Causal Hierarchy Theorem: In the weak topology on the space of interventional distributions ( $\mathfrak{S}_2$ ), the set of points admitting unique counterfactual lifts (the collapse set $C_2$ ) is meager (of first category), i.e., topologically negligible (Ibeling et al., 2021).
Implication: Any assumption ensuring uniqueness of counterfactuals from interventions carves out a non-open, statistically unverifiable subset of $\mathfrak{S}_2$ .
Comparison to Measure Theory: The measure-zero result (probabilistically negligible) is sharpened, as meagerness implies non-verifiability through experimental data—a learning-theoretic “no free lunch” for assumption-free causal inference.

4. Logical Formalization and Algorithmic Properties

The stratification admits exact logical and computational characterizations.

Probabilistic Languages: Three fragments $L_1, L_2, L_3$ precisely capture association, intervention, and counterfactual reasoning. Each is built from formulas for $\mathbb{P}(\cdot)$ , $[\alpha]\beta$ , and their polynomials (Ibeling et al., 2020).
Hilbert-Style Axiomatizations: For each $L_i$ exists a finitary axiomatization $AX_i$ (including Boolean, probability, polynomial, and causality axioms) that is sound and weakly complete over recursive SCMs and equivalent classes of probabilistic simulation programs.
Complexity: For $i = 1,2,3$ , determining satisfiability or validity in $L_i$ is PSPACE-complete. Each language admits a small-model property, allowing synthesis of small (scarce atoms) witness models.

5. Preservation and Invariance under Mappings

In Lorentzian geometry, upper levels of the causal ladder exhibit invariance under specific types of metric-preserving maps.

Distance-Homothetic Invariants: For a surjective, distance-homothetic, locally causally Lipschitz map $f:(X, d, \tau)\to (Y, \widetilde d, \widetilde \tau)$ , all five upper causality levels (GH, CS, CC, SC, Strong) are preserved. Furthermore, $f$ is a homeomorphism, sending maximal causal curves to maximal causal curves (Hau et al., 2020).
Consequence: This allows transport of causal-theoretic results from the smooth-metric setting to continuous and singular (non-manifold) spaces—a significant extension for mathematical relativity.

6. Applications, Challenges, and Implications in Machine Learning

Embedding the hierarchy in high-dimensional domains (e.g., image-based SCMs) exposes major complexities.

Ambiguity of Intervention: In domains such as vision, interventions may lack a unique operationalization. For text or image interventions (“make the bird spread its wings”), multiple plausible effect distributions exist, each consistent with an admissible SCM and exogenous realization (Zečević et al., 2022).
Nature of Interventions: Object placements vs. state changes highlight the heterogeneity of $do$ -operator semantics in practical generative modeling.
Limits of Purely Data-Driven Causal Inference: The meagerness of $C_2$ in the weak topology rigorously demonstrates that data, even interventional, is generically insufficient for learning full counterfactuals. Substantive assumptions (faithfulness, functional constraints) are inherently statistically unverifiable; they correspond to non-open subsets in the relevant topology (Ibeling et al., 2021).
Benchmarking and Practical Generation: The absence of large-scale benchmarks for interventional/counterfactual image datasets is acutely felt. Current generation methods (inpainting, fine-tuning diffusion models) do not scale or guarantee genuine L2/L3 fidelity. Performance assessment often relies on human judgment or structural correctness of proposed interventions rather than standard metrics.

7. Schematic Representation and Concluding Remarks

The causal hierarchy may be succinctly rendered as follows, especially in Lorentzian length spaces:

$\boxed{ \begin{array}{c} \mathrm{Global\ hyperbolicity} \ \Downarrow \ \mathrm{Causal\ simplicity} \ \Downarrow \ \mathrm{Causal\ continuity} \ \Downarrow \ \mathrm{Stable\ causality} \ \Downarrow \ \mathrm{Strong\ causality} \ \Downarrow \ \mathrm{Causality} \ \Downarrow \ \mathrm{Chronology} \end{array} }$

The Causal Hierarchy Theorem unifies developments across causal inference, logic, computational complexity, and mathematical physics. It formalizes sharp expressivity gaps between levels of causal knowledge and codifies the impossibility of universal identification of higher-order causal properties without further structural or functional assumptions. This framework is foundational to global causality theory in General Relativity (even for continuous or singular metrics (Hau et al., 2020)), provides the syntactic and complexity-theoretic underpinnings of causal inference (Ibeling et al., 2020), and rigorously pinpoints the topological obstacles to assumption-free causal induction (Ibeling et al., 2021).

A plausible implication is that further advances in practical and automated causal reasoning—especially in high-dimensional or nonparametric settings—will continue to rely not only on learning algorithms or data collection, but fundamentally on the articulation and explicit acceptance of substantive model-based assumptions.