Papers
Topics
Authors
Recent
2000 character limit reached

Centralized Gaussian Linear SCMs

Updated 11 January 2026
  • Centralized Gaussian Linear SCMs (CGL-SCMs) are Gaussian causal models where all exogenous variables are standardized to zero mean and unit variance, reducing parameter complexity.
  • They maintain full expressivity and observational equivalence to standard models, allowing for accurate identification and estimation of causal effects using graphical criteria.
  • An EM-based algorithm is employed for parameter learning, achieving high-fidelity causal effect estimation from finite sample data through closed-form interventions.

Centralized Gaussian Linear Structural Causal Models (CGL-SCMs) are a subclass of Gaussian Linear Structural Causal Models in which all exogenous variables (i.e., unobserved confounders and noises) are standardized to have zero mean and unit variance. This centralization eliminates the scale and location indeterminacy inherent in standard Gaussian Linear SCMs (GL-SCMs) by reducing the parameterization to a minimal yet fully expressive form. CGL-SCMs retain full expressivity with respect to observational and identifiable interventional distributions, enabling efficient parameter learning and causal effect estimation from finite samples using a specialized expectation–maximization (EM) procedure (Maiti et al., 8 Jan 2026).

1. Formal Specification and Expressivity

A Gaussian Linear SCM (GL-SCM) is defined by a tuple M=(U,ε),X,P,FXM' = \langle (U', \varepsilon'), X, P, F_X\rangle, where UN(μU,Σ2)U' \sim \mathcal{N}(\mu_{U'}, \Sigma^2) are multivariate normal confounders (with diagonal covariance), εN(με,Ψ2)\varepsilon' \sim \mathcal{N}(\mu_{\varepsilon'}, \Psi^2) are independent normal noise terms, and each endogenous variable XiX_i evolves via

Xi=jPao(Xi)αjiXj+kPau(Xi)αkiUk+μi+εi.X_i = \sum_{j\in\mathrm{Pa}^o(X_i)} \alpha_{j i} X_j + \sum_{k\in \mathrm{Pa}^u(X_i)} \alpha'_{k i} U'_k + \mu'_i + \varepsilon'_i.

Edges XjXiX_j \to X_i and UkXiU'_k \to X_i are present whenever αji0\alpha_{ji} \neq 0 and αki0\alpha'_{ki} \neq 0.

A CGL-SCM is the special case where all exogenous variables are standardized: UN(0,I)U \sim \mathcal{N}(0, I), εN(0,Ψ2)\varepsilon \sim \mathcal{N}(0, \Psi^2), with endogenous variable structure

Xi=jPao(Xi)αjiXj+kPau(Xi)αkiUk+μi+εi.X_i = \sum_{j\in\mathrm{Pa}^o(X_i)} \alpha_{j i} X_j + \sum_{k\in \mathrm{Pa}^u(X_i)} \alpha_{k i} U_k + \mu_i + \varepsilon_i.

Centralization removes the means and variances of UU', yielding a lower-dimensional parameter space.

Expressivity Theorem: For any GL-SCM MM' with observed distribution PM(X)P^{M'}(X), there exists a CGL-SCM MM with the same graph GG such that PM(X)=PM(X)P^{M}(X) = P^{M'}(X). Thus, CGL-SCMs and GL-SCMs are observationally indistinguishable and equally expressive in representing Gaussian-linear observational laws (Maiti et al., 8 Jan 2026).

2. Identifiability of Causal Effects

A query QQ (e.g., P(Ydo(X=x))P(Y|do(X=x))) is identifiable in a linear SCM with known graph GG if QQ can be expressed uniquely in terms of the observational distribution P(X)P(X). Standard identification procedures such as Pearl's do-calculus and linear criteria (including instrument sets and graphical criteria of Brito–Pearl and Tian) extend directly to CGL-SCMs since these depend only on the topology GG and Gaussianity.

Identification Theorem: For a GL-SCM MM' and corresponding CGL-SCM MM with PM(X)=PM(X)P^{M}(X) = P^{M'}(X), identifiable queries QQ satisfy PM(Q)=PM(Q)P^{M'}(Q) = P^{M}(Q). This permits one to work always in the lower-dimensional, centralized parameterization without loss for identifiable causal effect estimation (Maiti et al., 8 Jan 2026).

An illustrative example: In the simple chain XYX \to Y (no confounders), the CGL-SCM yields Y=αXYX+μY+εYY = \alpha_{X\to Y} X + \mu_Y + \varepsilon_Y, and P(Ydo(X=x))=N(μY+αXYx,Var(εY))P(Y \mid do(X=x)) = \mathcal{N}(\mu_Y + \alpha_{X\to Y} x, \mathrm{Var}(\varepsilon_Y)).

3. EM-Based Parameter Learning Algorithm

To estimate model parameters from data, the CGL-SCM admits a vectorized formulation. Let XRpX \in \mathbb{R}^p, URkU \in \mathbb{R}^k, TT the p×pp \times p weighted adjacency matrix of GG (with Tij=αijT_{ij} = \alpha_{ij}), and dd the length of the longest directed path. Define

B=I+T+T2++Td,B = I + T + T^2 + \cdots + T^d,

with BijB_{ij} the total sum of path weights from XiX_i to XjX_j, CC the k×pk \times p matrix of edges UXU \to X, and μRp\mu \in \mathbb{R}^p intercepts. The stacked equations are

X=Bμ+BCU+Bε.X = B^\top \mu + B^\top C^\top U + B^\top \varepsilon.

The joint (U,X)(U, X) is jointly Gaussian with explicitly computable mean and covariance.

EM Algorithm Steps:

  • E-step: For each data sample xix^i, compute

μUxi=CB[(CB)(CB)+BB]1(xiBμ)\mu_{U|x^i} = CB \left[ (CB)^\top(CB) + B^\top B \right]^{-1} (x^i - B^\top \mu)

ΣUxi=IkCB[(CB)(CB)+BB]1(CB)\Sigma_{U|x^i} = I_k - CB \left[ (CB)^\top(CB) + B^\top B \right]^{-1} (CB)^\top

  • M-step: Maximize the expected complete-data log-likelihood

L=nlogBBi=1nEUxi[(xiBμ(CB)U)(BB)1(xiBμ(CB)U)]L = -n \log|B^\top B| - \sum_{i=1}^n \mathbb{E}_{U|x^i} \left[ (x^i - B^\top \mu - (CB)^\top U)^\top (B^\top B)^{-1} (x^i - B^\top \mu - (CB)^\top U) \right]

with closed-form update for μ\mu:

μ1ni=1n((B)1xiCμUxi)\mu \gets \frac{1}{n} \sum_{i=1}^n \left( (B^\top)^{-1} x^i - C^\top \mu_{U|x^i} \right)

Updates for BB and CC are performed by masked gradient ascent, preserving the zero-pattern dictated by the graph GG.

EM guarantees non-decreasing observed-data likelihood at each iteration. Regularization (e.g., 2\ell_2-penalties) and early stopping are advisable for small nn to prevent overfitting (Maiti et al., 8 Jan 2026).

4. Causal Inference and Effect Estimation

After model fitting, causal queries are evaluated by modifying structural equations and computing the resulting Gaussian distribution, as dictated by do-calculus.

Do-Interventions: For intervention do(XA=xA)do(X_A = x_A), incoming edges to XAX_A are removed (i.e., zeroed in TT), and XAX_A is set to xAx_A. Remaining XBX_B are solved as linear functions of UU and ε\varepsilon. The post-interventional distribution P(XBdo(XA=xA))P(X_B \mid do(X_A = x_A)) remains multivariate normal, with parameters derived from the submatrices of the modified BB and CC.

Example (Linear Chain):

Chain Structure Total Effect P(Ydo(X=x))P(Y \mid do(X=x))
XMYX \to M \to Y BXY=αXMαMYB_{X \to Y} = \alpha_{X\to M}\alpha_{M \to Y} $\mathcal{N}(\mu_Y + B_{X\to Y} x,\;\text{[noise variance of$Y$]})$

This pipeline applies to any graph-identifiable query, including counterfactuals, due to the closed-form propagation properties of Gaussian-linear models (Maiti et al., 8 Jan 2026).

5. Empirical Evaluation and Application

Synthetic validation was conducted using the "frontdoor" and "napkin" benchmark graphs:

  • Frontdoor graph: (three observed nodes X1X2X3X_1 \to X_2 \to X_3 with unobserved confounder U4{X1,X3}U_4 \to \{X_1, X_3\})
  • Napkin graph: (four observed nodes, two latent confounders)
  • In both cases, 1000010\,000 samples were drawn from known CGL-SCMs.

After learning with the EM algorithm, the estimated causal effects closely matched ground truth. In the frontdoor scenario:

  • True P(X3do(X2=1))=N(1.100, 1.090)P(X_3 \mid do(X_2=1)) = \mathcal{N}(1.100,\ 1.090), estimated as N(1.102, 1.069)\mathcal{N}(1.102,\ 1.069).

For the napkin graph:

  • True P(X4do(X3=1))=N(0.300, 1.160)P(X_4 \mid do(X_3=1)) = \mathcal{N}(0.300,\ 1.160), estimated as N(0.305, 1.169)\mathcal{N}(0.305,\ 1.169).

Mean and variance estimates were consistently within a few percent of their true values, demonstrating high-fidelity recovery of causal effects from finite-sample observational data using the CGL-SCM EM algorithm (Maiti et al., 8 Jan 2026).

6. Parameter Reduction and Practical Advantages

CGL-SCMs achieve parameter reduction by standardizing all exogenous variables (confounders and noises) to zero mean and unit variance. This eliminates the latent scaling and location degrees of freedom in general GL-SCMs: the means and variances of UU' are removed from the model specification. As a result, the number of free parameters—particularly those associated with unobserved confounders—is drastically reduced. Despite this, the class retains full expressivity over both observational and graph-identifiable interventional distributions. This simplification is particularly advantageous for finite-sample learning, where overparameterization often leads to infeasible or unstable estimation in the presence of unobserved confounding (Maiti et al., 8 Jan 2026).

The EM-based learning algorithm accommodates this streamlined parameterization and enables efficient estimation of edge-weights and bias terms, ensuring that causal queries remain representable and computable in closed form after training.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Centralized Gaussian Linear SCMs (CGL-SCMs).