Staged Causal Graphs
- Staged causal graphs are a graphical framework that extends DAGs by using staged trees to capture context-specific conditional independencies in multivariate categorical data.
- They support both observational and interventional analyses by encoding how dependency structures vary with specific variable instantiations.
- This methodology facilitates robust Bayesian inference and model comparison, making it highly effective for high-dimensional categorical domains.
A staged causal graph—formally known as a staged tree—provides a graphical framework for representing and reasoning about context-specific conditional independencies and causal relationships in multivariate categorical data. Unlike directed acyclic graphs (DAGs), staged causal graphs generalize standard Bayesian networks to allow for context-specific dependencies, where the independence structure may change as a function of variable instantiations. The staged causal graph is rooted in an event tree representation, whose nodes (apart from the root and leaves) are assigned to equivalence classes called “stages.” These stages encode that all partial histories (i.e., paths in the tree corresponding to particular variable assignments up to a point) in the same class share identical conditional probability distributions for the next variable. This modeling paradigm supports both observational and interventional analysis, encompassing context-specific interventions and facilitating nonparametric Bayesian causal inference for categorical data (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025, Leonelli et al., 2021).
1. Formal Definition and Factorization
Let be a vector of categorical variables with respective finite state spaces . The construction of a staged causal graph starts with a fixed variable ordering (causal order), denoted . The event tree is a rooted, directed tree where:
- Each node at depth represents a partial history .
- Outgoing edges correspond to possible values of , leading to nodes at depth .
- Internal nodes at the same depth are partitioned into “stages.” Nodes in the same stage share the same transition probabilities for .
The joint distribution factorizes as: where denotes the (possibly context-dependent) subset of conditioning variables determined by the unique stage containing . Parameterization across the tree is achieved by assigning a parameter vector to each stage , indicating transition probabilities for each possible next value, and the factorization can be re-expressed as: This context-specific parameter sharing can encode conditional independencies that are only valid in certain regions of the joint state space—inducing context-specific independence (CSI) structure not capturable with fixed-parent DAGs (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025).
2. Graphical Equivalence and Model Inclusion Hierarchy
Statistical equivalence of two staged causal graphs is defined by equality of their induced sets of distributions. For a staged tree , there exists a unique minimal DAG that captures the symmetric (global) conditional independencies present in the set of distributions represented by , while asymmetric dependencies are encoded via colored (e.g., "red") edges classified as non-total.
Staged causal graphs admit the following inclusion hierarchy: where is the class of DAG models, of CStree models, of labeled-DAG (LDAG) models, and of staged-tree models. Each inclusion is strict; for example, staged trees can encode CSI patterns (e.g., ) not representable in any DAG, and they generalize even LDAGs by allowing arbitrary stage colorings (Duarte et al., 2021, Leonelli et al., 2021). Equivalence theorems characterize when two staged causal graphs are observationally or causally identical in the sense of their implied (interventional) distributions.
3. Causal Semantics and Interventions
A staged causal graph obtains causal interpretation by associating interventions (following the do-calculus framework) with modifications to the event tree. An intervention replaces the stochastic mechanism for with a deterministic assignment, formally by restricting the event tree at depth to only the outgoing edge corresponding to . For a set of intervention targets over the set of stages, a soft intervention rewrites only the affected conditional distributions, while a hard intervention replaces all mechanisms for so as to lose dependence on context.
The induced post-intervention distribution is: This allows for the modeling of context-specific, mechanism-targeted interventions, and supports the generalization of interventional calculus for arbitrary CSI structures (Duarte et al., 2021).
4. Inference, Learning, and Causal Effect Estimation
Bayesian inference for staged causal graphs proceeds by placing priors on both the stage-assignment partitions and the associated Dirichlet parameters. Parsimony-encouraging product-partition priors or distance-based (penalized) priors regularize the number and assignment of stages. Posterior inference employs MCMC with split-and-merge moves, drawing posterior samples over partitions and conditional probability vectors.
Average treatment effects (ATE) are identified through tree pruning and summing over compatible histories in the interventional regime, leveraging the fact that, after intervention, confounding is blocked by construction of the staged tree. For instance, with binary treatment , outcome , and intermediate covariates, the staged tree enables the computation: The posterior mean, credible intervals, and the full uncertainty distribution for the ATE are estimated by summarizing the MCMC draws for the underlying tree and stage parameters, requiring no further nonparametric adjustment (Cremaschi et al., 5 Nov 2025).
Learning staged tree structure from data can be performed via backward hill-climbing (stage merges under BIC/Bayesian score), k-means clustering on conditional probability table (CPT) vectors, and dynamic programming for variable ordering selection (Leonelli et al., 2021).
5. Model Comparison and Interventional Metrics
To quantify model similarity and equivalence beyond observational distribution, the context-specific interventional discrepancy (CID) is used. For two staged trees and , CID is defined as: CID equals zero if and only if all interventional distributions of and agree. This generalizes the structural intervention distance (SID) for DAGs to the context-specific, asymmetric settings staged trees uniquely model (Leonelli et al., 2021).
6. Illustrative Examples and Applications
Worked examples illustrate the encoding of CSI in real phenomena. For instance, in a chicken-pox model with four binary variables (income, previous diagnosis, exposure, carrier status), context-specific independencies—such as —are encoded as stage colorings at relevant tree depths, and the model supports nuanced intervention analysis: e.g., a targeted subsidy (intervening on ) or school program (intervening on ) changes mechanisms context-specifically (Duarte et al., 2021).
Other applications include learning staged trees from the ISTAT survey, COVID-19 patient data, and climatology, where context-specific dependencies, asymmetric causal structure, and intervention effects are recovered more effectively than with classic DAGs (Leonelli et al., 2021). An open-source R package, stagedtrees, implements algorithms for estimation, visualization, causal effect computation, and model comparison using staged causal graphs.
7. Significance and Relationships to Other Models
Staged causal graphs and their formalization via CStrees generalize the DAG framework, admitting more expressive representation of context-specific causal structure and interventions. They offer a concise yet flexible modeling language that strictly subsumes DAGs, LDAGs, and staged trees, while providing tractable criteria for model equivalence, parameterization, and learning. Their canonical factorization and model equivalence criteria, as well as their suitability for Bayesian nonparametric estimation, establish staged causal graphs as central tools for context-specific causal discovery and inference in high-dimensional categorical domains (Duarte et al., 2021, Cremaschi et al., 5 Nov 2025, Leonelli et al., 2021).