Papers
Topics
Authors
Recent
2000 character limit reached

Staged Causal Graphs

Updated 28 December 2025
  • Staged causal graphs are a graphical framework that extends DAGs by using staged trees to capture context-specific conditional independencies in multivariate categorical data.
  • They support both observational and interventional analyses by encoding how dependency structures vary with specific variable instantiations.
  • This methodology facilitates robust Bayesian inference and model comparison, making it highly effective for high-dimensional categorical domains.

A staged causal graph—formally known as a staged tree—provides a graphical framework for representing and reasoning about context-specific conditional independencies and causal relationships in multivariate categorical data. Unlike directed acyclic graphs (DAGs), staged causal graphs generalize standard Bayesian networks to allow for context-specific dependencies, where the independence structure may change as a function of variable instantiations. The staged causal graph is rooted in an event tree representation, whose nodes (apart from the root and leaves) are assigned to equivalence classes called “stages.” These stages encode that all partial histories (i.e., paths in the tree corresponding to particular variable assignments up to a point) in the same class share identical conditional probability distributions for the next variable. This modeling paradigm supports both observational and interventional analysis, encompassing context-specific interventions and facilitating nonparametric Bayesian causal inference for categorical data (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025, Leonelli et al., 2021).

1. Formal Definition and Factorization

Let X=(X1,,Xp)X = (X_1, \dots, X_p) be a vector of categorical variables with respective finite state spaces Xi\mathcal{X}_i. The construction of a staged causal graph starts with a fixed variable ordering (causal order), denoted π=(π1,,πp)\pi = (\pi_1, \dots, \pi_p). The event tree is a rooted, directed tree where:

  • Each node at depth i1i-1 represents a partial history (xπ1,,xπi1)(x_{\pi_1}, \dots, x_{\pi_{i-1}}).
  • Outgoing edges correspond to possible values of XπiX_{\pi_i}, leading to nodes at depth ii.
  • Internal nodes at the same depth are partitioned into “stages.” Nodes in the same stage share the same transition probabilities for XπiX_{\pi_i}.

The joint distribution factorizes as: P(x)=i=1pP(xπixSi(xπ1:πi1))P(x) = \prod_{i=1}^p P\bigl(x_{\pi_i} \mid x_{S_i(x_{\pi_1:\pi_{i-1}})}\bigr) where Si(xπ1:πi1)S_i(x_{\pi_1:\pi_{i-1}}) denotes the (possibly context-dependent) subset of conditioning variables determined by the unique stage containing xπ1:πi1x_{\pi_1:\pi_{i-1}}. Parameterization across the tree is achieved by assigning a parameter vector θs\theta_{s} to each stage ss, indicating transition probabilities for each possible next value, and the factorization can be re-expressed as: P(x)=i=1pθπi,si(xπ1:πi1)(xπi)P(x) = \prod_{i=1}^p \theta_{\pi_i,\, s_i(x_{\pi_1:\pi_{i-1}})}(x_{\pi_i}) This context-specific parameter sharing can encode conditional independencies that are only valid in certain regions of the joint state space—inducing context-specific independence (CSI) structure not capturable with fixed-parent DAGs (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025).

2. Graphical Equivalence and Model Inclusion Hierarchy

Statistical equivalence of two staged causal graphs is defined by equality of their induced sets of distributions. For a staged tree TT, there exists a unique minimal DAG GTG_T that captures the symmetric (global) conditional independencies present in the set of distributions represented by TT, while asymmetric dependencies are encoded via colored (e.g., "red") edges classified as non-total.

Staged causal graphs admit the following inclusion hierarchy: DCLS\mathbb{D} \subsetneq \mathbb{C} \subsetneq \mathbb{L} \subsetneq \mathbb{S} where D\mathbb{D} is the class of DAG models, C\mathbb{C} of CStree models, L\mathbb{L} of labeled-DAG (LDAG) models, and S\mathbb{S} of staged-tree models. Each inclusion is strict; for example, staged trees can encode CSI patterns (e.g., X3X2X1=0X_3 \perp X_2\,|\, X_1=0) not representable in any DAG, and they generalize even LDAGs by allowing arbitrary stage colorings (Duarte et al., 2021, Leonelli et al., 2021). Equivalence theorems characterize when two staged causal graphs are observationally or causally identical in the sense of their implied (interventional) distributions.

3. Causal Semantics and Interventions

A staged causal graph obtains causal interpretation by associating interventions (following the do-calculus framework) with modifications to the event tree. An intervention do(Xi=x)do(X_i = x) replaces the stochastic mechanism for XiX_i with a deterministic assignment, formally by restricting the event tree at depth ii to only the outgoing edge corresponding to xx. For a set of intervention targets II over the set of stages, a soft intervention rewrites only the affected conditional distributions, while a hard intervention replaces all mechanisms for XπiX_{\pi_i} so as to lose dependence on context.

The induced post-intervention distribution is: PI(x)=i=1,,p:Sπ,i(xS)IPI(xπixS)×i=1,,p:Sπ,i(xS)IP0(xπixS)P^I(x) = \prod_{\substack{i=1,\dots,p\,:\, \mathcal S_{\pi,i}(x_S)\in I}} P^I(x_{\pi_i}|x_S) \,\times\, \prod_{\substack{i=1,\dots,p\,:\, \mathcal S_{\pi,i}(x_S)\notin I}} P^0(x_{\pi_i}|x_S) This allows for the modeling of context-specific, mechanism-targeted interventions, and supports the generalization of interventional calculus for arbitrary CSI structures (Duarte et al., 2021).

4. Inference, Learning, and Causal Effect Estimation

Bayesian inference for staged causal graphs proceeds by placing priors on both the stage-assignment partitions and the associated Dirichlet parameters. Parsimony-encouraging product-partition priors or distance-based (penalized) priors regularize the number and assignment of stages. Posterior inference employs MCMC with split-and-merge moves, drawing posterior samples over partitions and conditional probability vectors.

Average treatment effects (ATE) are identified through tree pruning and summing over compatible histories in the interventional regime, leveraging the fact that, after intervention, confounding is blocked by construction of the staged tree. For instance, with binary treatment TT, outcome YY, and intermediate covariates, the staged tree enables the computation: ATE=E[Ydo(T=1)]E[Ydo(T=0)]ATE = E[Y\,|\,do(T=1)] - E[Y\,|\,do(T=0)] The posterior mean, credible intervals, and the full uncertainty distribution for the ATE are estimated by summarizing the MCMC draws for the underlying tree and stage parameters, requiring no further nonparametric adjustment (Cremaschi et al., 5 Nov 2025).

Learning staged tree structure from data can be performed via backward hill-climbing (stage merges under BIC/Bayesian score), k-means clustering on conditional probability table (CPT) vectors, and dynamic programming for variable ordering selection (Leonelli et al., 2021).

5. Model Comparison and Interventional Metrics

To quantify model similarity and equivalence beyond observational distribution, the context-specific interventional discrepancy (CID) is used. For two staged trees TT and SS, CID is defined as: dCID(T,S)=j=1px[j1]PT(Xjdo(X[j1]=x[j1]))PS(Xjdo(X[j1]=x[j1]))d_{\mathrm{CID}}(T, S) = \sum_{j=1}^p \sum_{x_{[j-1]}} |P_T(X_j\,|\, do(X_{[j-1]}=x_{[j-1]})) - P_S(X_j\,|\, do(X_{[j-1]}=x_{[j-1]}))| CID equals zero if and only if all interventional distributions of TT and SS agree. This generalizes the structural intervention distance (SID) for DAGs to the context-specific, asymmetric settings staged trees uniquely model (Leonelli et al., 2021).

6. Illustrative Examples and Applications

Worked examples illustrate the encoding of CSI in real phenomena. For instance, in a chicken-pox model with four binary variables (income, previous diagnosis, exposure, carrier status), context-specific independencies—such as X3X1X2=yesX_3 \perp X_1 \mid X_2 = \text{yes}—are encoded as stage colorings at relevant tree depths, and the model supports nuanced intervention analysis: e.g., a targeted subsidy (intervening on X1X_1) or school program (intervening on X2X_2) changes mechanisms context-specifically (Duarte et al., 2021).

Other applications include learning staged trees from the ISTAT survey, COVID-19 patient data, and climatology, where context-specific dependencies, asymmetric causal structure, and intervention effects are recovered more effectively than with classic DAGs (Leonelli et al., 2021). An open-source R package, stagedtrees, implements algorithms for estimation, visualization, causal effect computation, and model comparison using staged causal graphs.

7. Significance and Relationships to Other Models

Staged causal graphs and their formalization via CStrees generalize the DAG framework, admitting more expressive representation of context-specific causal structure and interventions. They offer a concise yet flexible modeling language that strictly subsumes DAGs, LDAGs, and staged trees, while providing tractable criteria for model equivalence, parameterization, and learning. Their canonical factorization and model equivalence criteria, as well as their suitability for Bayesian nonparametric estimation, establish staged causal graphs as central tools for context-specific causal discovery and inference in high-dimensional categorical domains (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025, Leonelli et al., 2021).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Staged Causal Graph.