Papers
Topics
Authors
Recent
2000 character limit reached

Structured Causal Models (SCMs)

Updated 16 January 2026
  • SCMs are mathematical models that represent causal relationships using directed acyclic graphs, exogenous noise, and structural equations.
  • They enable precise computation across observational, interventional, and counterfactual levels following Pearl’s do-calculus framework.
  • Applications include causal inference, generative modeling, fairness analysis, and reinforcement learning, with extensions to neural and nonparametric settings.

Structured Causal Models (SCMs) formalize the structural dependencies among observed and latent variables via a combination of directed (causal) graphs and deterministic or stochastic mechanisms. An SCM consists of (i) a set of endogenous variables, (ii) mutually independent exogenous (noise) variables, and (iii) a known or inferred directed acyclic graph (DAG) encoding parent–child relationships, with each variable assigned a structural equation specifying its value as a function of its causal parents and noise. The primary applications of SCMs are the computation of associations, effects of interventions, and the modeling of counterfactuals—corresponding to the three levels of Pearl’s causal hierarchy. SCMs have become central to causal inference, generative modeling, explanation, fairness analysis, and reinforcement learning, and have been extended to accommodate neural, dynamic, selection-biased, and extremely heavy-tailed settings.

1. Mathematical Definition and Hierarchical Levels

For dd observed variables X1,,XdX_1,\dots,X_d, an SCM is represented by

Xi:=fi(pa(Xi),Ui),(i=1,,d),X_{i} := f_{i}(\mathrm{pa}(X_{i}),\,U_{i}),\quad(i=1,\dots,d),

where pa(Xi)\mathrm{pa}(X_{i}) are the causal parents in the DAG and UiU_{i} are mutually independent exogenous noise variables, each equipped with a distribution FUiF_{U_i}. The full model specification enables computation at three distinct levels:

  • Observational (L1L_1): Estimation of the joint P(X1,,Xd)P(X_1,\dots,X_d).
  • Interventional (L2L_2): For an atomic intervention do(Xj=α)do(X_j=\alpha), the arrows into XjX_j are removed in the DAG and the corresponding structural equation for XjX_j is replaced by a constant.
  • Counterfactual (L3L_3): Following the three-step procedure—abduction, action, prediction—for a sample xx:

    1. Abduction: Invert structural assignments to recover uu;
    2. Action: Modify the SCM to include the (hypothetical) intervention;
    3. Prediction: Propagate uu through the modified SCM to obtain the counterfactual xcx^c (Sick et al., 20 Mar 2025).

These algebraic manipulations follow from Pearl’s formalization of do-calculus (Kaddour et al., 2022), and are applicable to any SCM with sufficient identifiability properties.

2. Model Classes: Parametric, Neural, and Transformation-based SCMs

Traditional SCMs typically specify fif_i using parametric families such as linear regression or logistic regression—leading to interpretable coefficients but risk introducing model bias. Neural SCMs expand fif_i to neural networks or normalizing flows, allowing approximation of arbitrary functional forms but often compromising interpretability and restricting to continuous data.

TRAM-DAGs (Transformation Model-DAGs) encode each conditional P(Xixpa(Xi))P(X_i\le x \mid \mathrm{pa}(X_i)) by a strictly increasing function hi(xpa(Xi))h_i(x|\mathrm{pa}(X_i)) mapping into a fixed latent CDF FUF_U: FXipa(Xi)(x)=FU(hi(xpa(Xi))),F_{X_i|\mathrm{pa}(X_i)}(x) = F_U(h_i(x|\mathrm{pa}(X_i))), with FUF_U typically chosen as the standard logistic CDF. For continuous variables, hI(xpa)h_I(x|\mathrm{pa}) is a Bernstein polynomial; for ordinal/binary variables, a step-function intercept is used; shift terms can be linear (βij\beta_{ij}) or parent-specific nonlinear (γij(xj)\gamma_{ij}(x_j)). Invertibility of hih_i supports abduction for counterfactual inference; monotonicity is enforced through linear constraints during training (Sick et al., 20 Mar 2025).

TRAM-DAGs allow a spectrum of model choice between interpretability and expressive power:

  • Linear shifts are explicitly interpretable as causal log-odds-ratios.

  • Complex shifts are visualizable as nonlinear effect curves.

Neural causal models (e.g., VACA, CAREFL, normalizing-flow SCMs) may only support L1L_1 and L2L_2 queries for mixed or discrete variables; continuous TRAM-DAGs enable all three levels (Sick et al., 20 Mar 2025).

3. Extensions: Cyclic, Latent, and Nonparametric Models

SCMs with cycles and latent confounders require careful treatment. A solution mapping gOg_\mathcal{O} from parents and noise to each strongly connected component is necessary for existence and uniqueness of the induced joint distributions; unique solvability is equivalent to the existence of such a mapping. Counterfactual and interventional equivalence is retained under strict solvability conditions. Marginalization is defined by substituting the solution for variables to be eliminated, preserving observational/interventional semantics if conditions hold (Bongers et al., 2016).

Nonparametric SCMs, notably Additive Noise Models (ANMs), permit identification of the DAG under regularity assumptions and independent noise. iSCAN provides practical tools for detecting causal mechanism shifts among related datasets without complete structure recovery, leveraging mixture score Jacobian tests for node-specific shifts and feature-ordering-based conditional independence for parent recovery (Chen et al., 2023).

4. Algorithms, Learning, and Representations

Fitting an SCM classically involves maximizing the factorized likelihood over all variables: L(θ)=n=1Ni=1dlogfXipa(Xi)(xi(n)pa(n)),\mathcal{L}(\theta) = \sum_{n=1}^N \sum_{i=1}^d \log f_{X_{i}\mid \mathrm{pa}(X_{i})}(x_{i}^{(n)}|\mathrm{pa}^{(n)}), with penalties and constraints as dictated by the model architecture. Neural parameterizations are typically optimized using Adam for efficient convergence. Abduction for counterfactual queries may exploit normalizing flow inversion, variational approximations, or explicit conditional models.

Graph Neural Networks (GNNs) have been shown to universally subsume SCMs (Zečević et al., 2021):

  • Any SCM can be realized as a single-layer message-passing network.
  • The iVGAE model class uses GNN-based variational autoencoders respecting the causal graph and intervention rules, achieving consistency up to interventional queries but not supporting full counterfactual inference.

Amortized inference (Cond-FiP) (Mahajan et al., 2024) learns a single SCM-compatible encoding across multiple datasets, supporting zero-shot simulation of new observational/interventional distributions. This conditional transformer architecture generalizes to unseen graph topologies and enables efficient generative simulation.

Internally standardized SCMs (iSCMs) (Ormaniec et al., 2024) modify each equation to standardize outputs before propagation, eradicating variance/correlation artifacts that otherwise bias structure-learning benchmarks.

Mechanism consolidation (Willig et al., 2023) aggregates sub-networks into meta-variables, maintaining interventional fidelity. Unlike marginalization (which destroys the ability to intervene on eliminated nodes), consolidation preserves all possible actions within collapsed mechanisms, reducing model complexity and supporting generalization across parameterized families.

5. Applications and Benchmarks

SCMs are pivotal in diverse causal machine learning subfields (Kaddour et al., 2022):

  • Causal supervised learning: Identifying invariant predictors robust to interventions.
  • Causal generative modeling: Simulating interventional and counterfactual distributions.
  • Causal explanation: Quantifying causal influence of inputs, feature attribution, and recourse.
  • Causal fairness: Defining and implementing counterfactual and interventional fairness.
  • Causal reinforcement learning: Modeling policies, interventions, and credit assignment.

Synthetic benchmark generation frameworks, including sequence-driven SCMs constructed from LLMs (Bynum et al., 2024), allow the creation of datasets with explicit, controllable causal structure for algorithm validation. These approaches support systematic evaluation of average, conditional, and individual treatment effects under known confounding.

SCMs have been extended for dynamical systems via SDCMs (Bongers et al., 2018), for latent selection bias modeling (Chen et al., 2024), and for extremes-oriented causal inference (XSCMs) employing transformed-linear algebra for tail-dependent data (Jiang et al., 12 May 2025).

6. Theoretical Frameworks and Open Problems

Category-theoretic SCMs (D'Acunto et al., 13 Mar 2025) organize models and their observational/interventional probability measures into functor categories and convex spaces, supporting transfer and abstraction of causal knowledge. Causal Constraints Models (CCMs) (Blom et al., 2018) generalize SCMs to encode equilibrium behaviors, conservation laws, and functional invariance—in particular, CCMs can encode constants of motion and laws (e.g., the ideal gas law) under interventions, which are not faithfully representable in standard SCMs.

Key unresolved issues include:

  • Identifiability in nonparametric and neural SCMs.
  • Unified libraries for SCM manipulation and do-calculus.
  • Algorithmic robustness to artifacts in benchmarking datasets.
  • Automatic abstraction and variable discovery for large-scale systems.

7. Interpretability, Model Selection, and Limitations

Interpretability remains a central concern:

  • Classical parametric SCMs yield coefficients with explicit causal semantics.
  • Transformation-based models (TRAM-DAGs) maintain interpretability via linear and Bernstein terms, while permitting nonlinear expansion.
  • Neural and normalizing-flow SCMs trade interpretability for universal approximation capability.

Limitations vary by approach:

  • Neural models may lose transparency and be restricted to certain data types.
  • Counterfactual identifiability may fail if bijective inversion is not possible.
  • iSCAN and related mechanism-shift detection algorithms focus on detecting sparse change, not global structure.

A plausible implication is that structured causal modeling will increasingly integrate combinatorial algorithms, neural function classes, and probabilistic abstraction frameworks to address scale, heterogeneity, and interpretability challenges in modern scientific and engineering domains.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Structured Causal Models (SCMs).