Staged Causal Graphs

Updated 28 December 2025

Staged causal graphs are a graphical framework that extends DAGs by using staged trees to capture context-specific conditional independencies in multivariate categorical data.
They support both observational and interventional analyses by encoding how dependency structures vary with specific variable instantiations.
This methodology facilitates robust Bayesian inference and model comparison, making it highly effective for high-dimensional categorical domains.

A staged causal graph—formally known as a staged tree—provides a graphical framework for representing and reasoning about context-specific conditional independencies and causal relationships in multivariate categorical data. Unlike directed acyclic graphs (DAGs), staged causal graphs generalize standard Bayesian networks to allow for context-specific dependencies, where the independence structure may change as a function of variable instantiations. The staged causal graph is rooted in an event tree representation, whose nodes (apart from the root and leaves) are assigned to equivalence classes called “stages.” These stages encode that all partial histories (i.e., paths in the tree corresponding to particular variable assignments up to a point) in the same class share identical conditional probability distributions for the next variable. This modeling paradigm supports both observational and interventional analysis, encompassing context-specific interventions and facilitating nonparametric Bayesian causal inference for categorical data (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025, Leonelli et al., 2021).

1. Formal Definition and Factorization

Let $X = (X_1, \dots, X_p)$ be a vector of categorical variables with respective finite state spaces $\mathcal{X}_i$ . The construction of a staged causal graph starts with a fixed variable ordering (causal order), denoted $\pi = (\pi_1, \dots, \pi_p)$ . The event tree is a rooted, directed tree where:

Each node at depth $i-1$ represents a partial history $(x_{\pi_1}, \dots, x_{\pi_{i-1}})$ .
Outgoing edges correspond to possible values of $X_{\pi_i}$ , leading to nodes at depth $i$ .
Internal nodes at the same depth are partitioned into “stages.” Nodes in the same stage share the same transition probabilities for $X_{\pi_i}$ .

The joint distribution factorizes as: $P(x) = \prod_{i=1}^p P\bigl(x_{\pi_i} \mid x_{S_i(x_{\pi_1:\pi_{i-1}})}\bigr)$ where $S_i(x_{\pi_1:\pi_{i-1}})$ denotes the (possibly context-dependent) subset of conditioning variables determined by the unique stage containing $x_{\pi_1:\pi_{i-1}}$ . Parameterization across the tree is achieved by assigning a parameter vector $\theta_{s}$ to each stage $s$ , indicating transition probabilities for each possible next value, and the factorization can be re-expressed as: $P(x) = \prod_{i=1}^p \theta_{\pi_i,\, s_i(x_{\pi_1:\pi_{i-1}})}(x_{\pi_i})$ This context-specific parameter sharing can encode conditional independencies that are only valid in certain regions of the joint state space—inducing context-specific independence (CSI) structure not capturable with fixed-parent DAGs (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025).

2. Graphical Equivalence and Model Inclusion Hierarchy

Statistical equivalence of two staged causal graphs is defined by equality of their induced sets of distributions. For a staged tree $T$ , there exists a unique minimal DAG $G_T$ that captures the symmetric (global) conditional independencies present in the set of distributions represented by $T$ , while asymmetric dependencies are encoded via colored (e.g., "red") edges classified as non-total.

Staged causal graphs admit the following inclusion hierarchy: $\mathbb{D} \subsetneq \mathbb{C} \subsetneq \mathbb{L} \subsetneq \mathbb{S}$ where $\mathbb{D}$ is the class of DAG models, $\mathbb{C}$ of CStree models, $\mathbb{L}$ of labeled-DAG (LDAG) models, and $\mathbb{S}$ of staged-tree models. Each inclusion is strict; for example, staged trees can encode CSI patterns (e.g., $X_3 \perp X_2\,|\, X_1=0$ ) not representable in any DAG, and they generalize even LDAGs by allowing arbitrary stage colorings (Duarte et al., 2021, Leonelli et al., 2021). Equivalence theorems characterize when two staged causal graphs are observationally or causally identical in the sense of their implied (interventional) distributions.

3. Causal Semantics and Interventions

A staged causal graph obtains causal interpretation by associating interventions (following the do-calculus framework) with modifications to the event tree. An intervention $do(X_i = x)$ replaces the stochastic mechanism for $X_i$ with a deterministic assignment, formally by restricting the event tree at depth $i$ to only the outgoing edge corresponding to $x$ . For a set of intervention targets $I$ over the set of stages, a soft intervention rewrites only the affected conditional distributions, while a hard intervention replaces all mechanisms for $X_{\pi_i}$ so as to lose dependence on context.

The induced post-intervention distribution is: $P^I(x) = \prod_{\substack{i=1,\dots,p\,:\, \mathcal S_{\pi,i}(x_S)\in I}} P^I(x_{\pi_i}|x_S) \,\times\, \prod_{\substack{i=1,\dots,p\,:\, \mathcal S_{\pi,i}(x_S)\notin I}} P^0(x_{\pi_i}|x_S)$ This allows for the modeling of context-specific, mechanism-targeted interventions, and supports the generalization of interventional calculus for arbitrary CSI structures (Duarte et al., 2021).

4. Inference, Learning, and Causal Effect Estimation

Bayesian inference for staged causal graphs proceeds by placing priors on both the stage-assignment partitions and the associated Dirichlet parameters. Parsimony-encouraging product-partition priors or distance-based (penalized) priors regularize the number and assignment of stages. Posterior inference employs MCMC with split-and-merge moves, drawing posterior samples over partitions and conditional probability vectors.

Average treatment effects (ATE) are identified through tree pruning and summing over compatible histories in the interventional regime, leveraging the fact that, after intervention, confounding is blocked by construction of the staged tree. For instance, with binary treatment $T$ , outcome $Y$ , and intermediate covariates, the staged tree enables the computation: $ATE = E[Y\,|\,do(T=1)] - E[Y\,|\,do(T=0)]$ The posterior mean, credible intervals, and the full uncertainty distribution for the ATE are estimated by summarizing the MCMC draws for the underlying tree and stage parameters, requiring no further nonparametric adjustment (Cremaschi et al., 5 Nov 2025).

Learning staged tree structure from data can be performed via backward hill-climbing (stage merges under BIC/Bayesian score), k-means clustering on conditional probability table (CPT) vectors, and dynamic programming for variable ordering selection (Leonelli et al., 2021).

5. Model Comparison and Interventional Metrics

To quantify model similarity and equivalence beyond observational distribution, the context-specific interventional discrepancy (CID) is used. For two staged trees $T$ and $S$ , CID is defined as: $d_{\mathrm{CID}}(T, S) = \sum_{j=1}^p \sum_{x_{[j-1]}} |P_T(X_j\,|\, do(X_{[j-1]}=x_{[j-1]})) - P_S(X_j\,|\, do(X_{[j-1]}=x_{[j-1]}))|$ CID equals zero if and only if all interventional distributions of $T$ and $S$ agree. This generalizes the structural intervention distance (SID) for DAGs to the context-specific, asymmetric settings staged trees uniquely model (Leonelli et al., 2021).

6. Illustrative Examples and Applications

Worked examples illustrate the encoding of CSI in real phenomena. For instance, in a chicken-pox model with four binary variables (income, previous diagnosis, exposure, carrier status), context-specific independencies—such as $X_3 \perp X_1 \mid X_2 = \text{yes}$ —are encoded as stage colorings at relevant tree depths, and the model supports nuanced intervention analysis: e.g., a targeted subsidy (intervening on $X_1$ ) or school program (intervening on $X_2$ ) changes mechanisms context-specifically (Duarte et al., 2021).

Other applications include learning staged trees from the ISTAT survey, COVID-19 patient data, and climatology, where context-specific dependencies, asymmetric causal structure, and intervention effects are recovered more effectively than with classic DAGs (Leonelli et al., 2021). An open-source R package, stagedtrees, implements algorithms for estimation, visualization, causal effect computation, and model comparison using staged causal graphs.

7. Significance and Relationships to Other Models

Staged causal graphs and their formalization via CStrees generalize the DAG framework, admitting more expressive representation of context-specific causal structure and interventions. They offer a concise yet flexible modeling language that strictly subsumes DAGs, LDAGs, and staged trees, while providing tractable criteria for model equivalence, parameterization, and learning. Their canonical factorization and model equivalence criteria, as well as their suitability for Bayesian nonparametric estimation, establish staged causal graphs as central tools for context-specific causal discovery and inference in high-dimensional categorical domains (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025, Leonelli et al., 2021).

Markdown Upgrade to Chat

References (3)

Representation of Context-Specific Causal Models with Observational and Interventional Data (2021)

Bayesian Causal Effect Estimation for Categorical Data using Staged Tree Models (2025)

Context-Specific Causal Discovery for Categorical Data Using Staged Trees (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Staged Causal Graph.

Staged Causal Graphs

1. Formal Definition and Factorization

2. Graphical Equivalence and Model Inclusion Hierarchy

3. Causal Semantics and Interventions

4. Inference, Learning, and Causal Effect Estimation

5. Model Comparison and Interventional Metrics

6. Illustrative Examples and Applications

7. Significance and Relationships to Other Models

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Staged Causal Graphs

1. Formal Definition and Factorization

2. Graphical Equivalence and Model Inclusion Hierarchy

3. Causal Semantics and Interventions

4. Inference, Learning, and Causal Effect Estimation

5. Model Comparison and Interventional Metrics

6. Illustrative Examples and Applications

7. Significance and Relationships to Other Models

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research