Structured Causal Models (SCMs)
- SCMs are mathematical models that represent causal relationships using directed acyclic graphs, exogenous noise, and structural equations.
- They enable precise computation across observational, interventional, and counterfactual levels following Pearl’s do-calculus framework.
- Applications include causal inference, generative modeling, fairness analysis, and reinforcement learning, with extensions to neural and nonparametric settings.
Structured Causal Models (SCMs) formalize the structural dependencies among observed and latent variables via a combination of directed (causal) graphs and deterministic or stochastic mechanisms. An SCM consists of (i) a set of endogenous variables, (ii) mutually independent exogenous (noise) variables, and (iii) a known or inferred directed acyclic graph (DAG) encoding parent–child relationships, with each variable assigned a structural equation specifying its value as a function of its causal parents and noise. The primary applications of SCMs are the computation of associations, effects of interventions, and the modeling of counterfactuals—corresponding to the three levels of Pearl’s causal hierarchy. SCMs have become central to causal inference, generative modeling, explanation, fairness analysis, and reinforcement learning, and have been extended to accommodate neural, dynamic, selection-biased, and extremely heavy-tailed settings.
1. Mathematical Definition and Hierarchical Levels
For observed variables , an SCM is represented by
where are the causal parents in the DAG and are mutually independent exogenous noise variables, each equipped with a distribution . The full model specification enables computation at three distinct levels:
- Observational (): Estimation of the joint .
- Interventional (): For an atomic intervention , the arrows into are removed in the DAG and the corresponding structural equation for is replaced by a constant.
- Counterfactual (): Following the three-step procedure—abduction, action, prediction—for a sample :
- Abduction: Invert structural assignments to recover ;
- Action: Modify the SCM to include the (hypothetical) intervention;
- Prediction: Propagate through the modified SCM to obtain the counterfactual (Sick et al., 20 Mar 2025).
These algebraic manipulations follow from Pearl’s formalization of do-calculus (Kaddour et al., 2022), and are applicable to any SCM with sufficient identifiability properties.
2. Model Classes: Parametric, Neural, and Transformation-based SCMs
Traditional SCMs typically specify using parametric families such as linear regression or logistic regression—leading to interpretable coefficients but risk introducing model bias. Neural SCMs expand to neural networks or normalizing flows, allowing approximation of arbitrary functional forms but often compromising interpretability and restricting to continuous data.
TRAM-DAGs (Transformation Model-DAGs) encode each conditional by a strictly increasing function mapping into a fixed latent CDF : with typically chosen as the standard logistic CDF. For continuous variables, is a Bernstein polynomial; for ordinal/binary variables, a step-function intercept is used; shift terms can be linear () or parent-specific nonlinear (). Invertibility of supports abduction for counterfactual inference; monotonicity is enforced through linear constraints during training (Sick et al., 20 Mar 2025).
TRAM-DAGs allow a spectrum of model choice between interpretability and expressive power:
Linear shifts are explicitly interpretable as causal log-odds-ratios.
- Complex shifts are visualizable as nonlinear effect curves.
Neural causal models (e.g., VACA, CAREFL, normalizing-flow SCMs) may only support and queries for mixed or discrete variables; continuous TRAM-DAGs enable all three levels (Sick et al., 20 Mar 2025).
3. Extensions: Cyclic, Latent, and Nonparametric Models
SCMs with cycles and latent confounders require careful treatment. A solution mapping from parents and noise to each strongly connected component is necessary for existence and uniqueness of the induced joint distributions; unique solvability is equivalent to the existence of such a mapping. Counterfactual and interventional equivalence is retained under strict solvability conditions. Marginalization is defined by substituting the solution for variables to be eliminated, preserving observational/interventional semantics if conditions hold (Bongers et al., 2016).
Nonparametric SCMs, notably Additive Noise Models (ANMs), permit identification of the DAG under regularity assumptions and independent noise. iSCAN provides practical tools for detecting causal mechanism shifts among related datasets without complete structure recovery, leveraging mixture score Jacobian tests for node-specific shifts and feature-ordering-based conditional independence for parent recovery (Chen et al., 2023).
4. Algorithms, Learning, and Representations
Fitting an SCM classically involves maximizing the factorized likelihood over all variables: with penalties and constraints as dictated by the model architecture. Neural parameterizations are typically optimized using Adam for efficient convergence. Abduction for counterfactual queries may exploit normalizing flow inversion, variational approximations, or explicit conditional models.
Graph Neural Networks (GNNs) have been shown to universally subsume SCMs (Zečević et al., 2021):
- Any SCM can be realized as a single-layer message-passing network.
- The iVGAE model class uses GNN-based variational autoencoders respecting the causal graph and intervention rules, achieving consistency up to interventional queries but not supporting full counterfactual inference.
Amortized inference (Cond-FiP) (Mahajan et al., 2024) learns a single SCM-compatible encoding across multiple datasets, supporting zero-shot simulation of new observational/interventional distributions. This conditional transformer architecture generalizes to unseen graph topologies and enables efficient generative simulation.
Internally standardized SCMs (iSCMs) (Ormaniec et al., 2024) modify each equation to standardize outputs before propagation, eradicating variance/correlation artifacts that otherwise bias structure-learning benchmarks.
Mechanism consolidation (Willig et al., 2023) aggregates sub-networks into meta-variables, maintaining interventional fidelity. Unlike marginalization (which destroys the ability to intervene on eliminated nodes), consolidation preserves all possible actions within collapsed mechanisms, reducing model complexity and supporting generalization across parameterized families.
5. Applications and Benchmarks
SCMs are pivotal in diverse causal machine learning subfields (Kaddour et al., 2022):
- Causal supervised learning: Identifying invariant predictors robust to interventions.
- Causal generative modeling: Simulating interventional and counterfactual distributions.
- Causal explanation: Quantifying causal influence of inputs, feature attribution, and recourse.
- Causal fairness: Defining and implementing counterfactual and interventional fairness.
- Causal reinforcement learning: Modeling policies, interventions, and credit assignment.
Synthetic benchmark generation frameworks, including sequence-driven SCMs constructed from LLMs (Bynum et al., 2024), allow the creation of datasets with explicit, controllable causal structure for algorithm validation. These approaches support systematic evaluation of average, conditional, and individual treatment effects under known confounding.
SCMs have been extended for dynamical systems via SDCMs (Bongers et al., 2018), for latent selection bias modeling (Chen et al., 2024), and for extremes-oriented causal inference (XSCMs) employing transformed-linear algebra for tail-dependent data (Jiang et al., 12 May 2025).
6. Theoretical Frameworks and Open Problems
Category-theoretic SCMs (D'Acunto et al., 13 Mar 2025) organize models and their observational/interventional probability measures into functor categories and convex spaces, supporting transfer and abstraction of causal knowledge. Causal Constraints Models (CCMs) (Blom et al., 2018) generalize SCMs to encode equilibrium behaviors, conservation laws, and functional invariance—in particular, CCMs can encode constants of motion and laws (e.g., the ideal gas law) under interventions, which are not faithfully representable in standard SCMs.
Key unresolved issues include:
- Identifiability in nonparametric and neural SCMs.
- Unified libraries for SCM manipulation and do-calculus.
- Algorithmic robustness to artifacts in benchmarking datasets.
- Automatic abstraction and variable discovery for large-scale systems.
7. Interpretability, Model Selection, and Limitations
Interpretability remains a central concern:
- Classical parametric SCMs yield coefficients with explicit causal semantics.
- Transformation-based models (TRAM-DAGs) maintain interpretability via linear and Bernstein terms, while permitting nonlinear expansion.
- Neural and normalizing-flow SCMs trade interpretability for universal approximation capability.
Limitations vary by approach:
- Neural models may lose transparency and be restricted to certain data types.
- Counterfactual identifiability may fail if bijective inversion is not possible.
- iSCAN and related mechanism-shift detection algorithms focus on detecting sparse change, not global structure.
A plausible implication is that structured causal modeling will increasingly integrate combinatorial algorithms, neural function classes, and probabilistic abstraction frameworks to address scale, heterogeneity, and interpretability challenges in modern scientific and engineering domains.