Centralized Gaussian Linear SCMs

Updated 11 January 2026

Centralized Gaussian Linear SCMs (CGL-SCMs) are Gaussian causal models where all exogenous variables are standardized to zero mean and unit variance, reducing parameter complexity.
They maintain full expressivity and observational equivalence to standard models, allowing for accurate identification and estimation of causal effects using graphical criteria.
An EM-based algorithm is employed for parameter learning, achieving high-fidelity causal effect estimation from finite sample data through closed-form interventions.

Centralized Gaussian Linear Structural Causal Models (CGL-SCMs) are a subclass of Gaussian Linear Structural Causal Models in which all exogenous variables (i.e., unobserved confounders and noises) are standardized to have zero mean and unit variance. This centralization eliminates the scale and location indeterminacy inherent in standard Gaussian Linear SCMs (GL-SCMs) by reducing the parameterization to a minimal yet fully expressive form. CGL-SCMs retain full expressivity with respect to observational and identifiable interventional distributions, enabling efficient parameter learning and causal effect estimation from finite samples using a specialized expectation–maximization (EM) procedure (Maiti et al., 8 Jan 2026).

1. Formal Specification and Expressivity

A Gaussian Linear SCM (GL-SCM) is defined by a tuple $M' = \langle (U', \varepsilon'), X, P, F_X\rangle$ , where $U' \sim \mathcal{N}(\mu_{U'}, \Sigma^2)$ are multivariate normal confounders (with diagonal covariance), $\varepsilon' \sim \mathcal{N}(\mu_{\varepsilon'}, \Psi^2)$ are independent normal noise terms, and each endogenous variable $X_i$ evolves via

$X_i = \sum_{j\in\mathrm{Pa}^o(X_i)} \alpha_{j i} X_j + \sum_{k\in \mathrm{Pa}^u(X_i)} \alpha'_{k i} U'_k + \mu'_i + \varepsilon'_i.$

Edges $X_j \to X_i$ and $U'_k \to X_i$ are present whenever $\alpha_{ji} \neq 0$ and $\alpha'_{ki} \neq 0$ .

A CGL-SCM is the special case where all exogenous variables are standardized: $U \sim \mathcal{N}(0, I)$ , $\varepsilon \sim \mathcal{N}(0, \Psi^2)$ , with endogenous variable structure

$X_i = \sum_{j\in\mathrm{Pa}^o(X_i)} \alpha_{j i} X_j + \sum_{k\in \mathrm{Pa}^u(X_i)} \alpha_{k i} U_k + \mu_i + \varepsilon_i.$

Centralization removes the means and variances of $U'$ , yielding a lower-dimensional parameter space.

Expressivity Theorem: For any GL-SCM $M'$ with observed distribution $P^{M'}(X)$ , there exists a CGL-SCM $M$ with the same graph $G$ such that $P^{M}(X) = P^{M'}(X)$ . Thus, CGL-SCMs and GL-SCMs are observationally indistinguishable and equally expressive in representing Gaussian-linear observational laws (Maiti et al., 8 Jan 2026).

2. Identifiability of Causal Effects

A query $Q$ (e.g., $P(Y|do(X=x))$ ) is identifiable in a linear SCM with known graph $G$ if $Q$ can be expressed uniquely in terms of the observational distribution $P(X)$ . Standard identification procedures such as Pearl's do-calculus and linear criteria (including instrument sets and graphical criteria of Brito–Pearl and Tian) extend directly to CGL-SCMs since these depend only on the topology $G$ and Gaussianity.

Identification Theorem: For a GL-SCM $M'$ and corresponding CGL-SCM $M$ with $P^{M}(X) = P^{M'}(X)$ , identifiable queries $Q$ satisfy $P^{M'}(Q) = P^{M}(Q)$ . This permits one to work always in the lower-dimensional, centralized parameterization without loss for identifiable causal effect estimation (Maiti et al., 8 Jan 2026).

An illustrative example: In the simple chain $X \to Y$ (no confounders), the CGL-SCM yields $Y = \alpha_{X\to Y} X + \mu_Y + \varepsilon_Y$ , and $P(Y \mid do(X=x)) = \mathcal{N}(\mu_Y + \alpha_{X\to Y} x, \mathrm{Var}(\varepsilon_Y))$ .

3. EM-Based Parameter Learning Algorithm

To estimate model parameters from data, the CGL-SCM admits a vectorized formulation. Let $X \in \mathbb{R}^p$ , $U \in \mathbb{R}^k$ , $T$ the $p \times p$ weighted adjacency matrix of $G$ (with $T_{ij} = \alpha_{ij}$ ), and $d$ the length of the longest directed path. Define

$B = I + T + T^2 + \cdots + T^d,$

with $B_{ij}$ the total sum of path weights from $X_i$ to $X_j$ , $C$ the $k \times p$ matrix of edges $U \to X$ , and $\mu \in \mathbb{R}^p$ intercepts. The stacked equations are

$X = B^\top \mu + B^\top C^\top U + B^\top \varepsilon.$

The joint $(U, X)$ is jointly Gaussian with explicitly computable mean and covariance.

EM Algorithm Steps:

E-step: For each data sample $x^i$ , compute

$\mu_{U|x^i} = CB \left[ (CB)^\top(CB) + B^\top B \right]^{-1} (x^i - B^\top \mu)$

$\Sigma_{U|x^i} = I_k - CB \left[ (CB)^\top(CB) + B^\top B \right]^{-1} (CB)^\top$

M-step: Maximize the expected complete-data log-likelihood

$L = -n \log|B^\top B| - \sum_{i=1}^n \mathbb{E}_{U|x^i} \left[ (x^i - B^\top \mu - (CB)^\top U)^\top (B^\top B)^{-1} (x^i - B^\top \mu - (CB)^\top U) \right]$

with closed-form update for $\mu$ :

$\mu \gets \frac{1}{n} \sum_{i=1}^n \left( (B^\top)^{-1} x^i - C^\top \mu_{U|x^i} \right)$

Updates for $B$ and $C$ are performed by masked gradient ascent, preserving the zero-pattern dictated by the graph $G$ .

EM guarantees non-decreasing observed-data likelihood at each iteration. Regularization (e.g., $\ell_2$ -penalties) and early stopping are advisable for small $n$ to prevent overfitting (Maiti et al., 8 Jan 2026).

4. Causal Inference and Effect Estimation

After model fitting, causal queries are evaluated by modifying structural equations and computing the resulting Gaussian distribution, as dictated by do-calculus.

Do-Interventions: For intervention $do(X_A = x_A)$ , incoming edges to $X_A$ are removed (i.e., zeroed in $T$ ), and $X_A$ is set to $x_A$ . Remaining $X_B$ are solved as linear functions of $U$ and $\varepsilon$ . The post-interventional distribution $P(X_B \mid do(X_A = x_A))$ remains multivariate normal, with parameters derived from the submatrices of the modified $B$ and $C$ .

Example (Linear Chain):

Chain Structure	Total Effect	$P(Y \mid do(X=x))$
$X \to M \to Y$	$B_{X \to Y} = \alpha_{X\to M}\alpha_{M \to Y}$	$\mathcal{N}(\mu_Y + B_{X\to Y} x,\;\text{[noise variance of$Y$]})$

This pipeline applies to any graph-identifiable query, including counterfactuals, due to the closed-form propagation properties of Gaussian-linear models (Maiti et al., 8 Jan 2026).

5. Empirical Evaluation and Application

Synthetic validation was conducted using the "frontdoor" and "napkin" benchmark graphs:

Frontdoor graph: (three observed nodes $X_1 \to X_2 \to X_3$ with unobserved confounder $U_4 \to \{X_1, X_3\}$ )
Napkin graph: (four observed nodes, two latent confounders)
In both cases, $10\,000$ samples were drawn from known CGL-SCMs.

After learning with the EM algorithm, the estimated causal effects closely matched ground truth. In the frontdoor scenario:

True $P(X_3 \mid do(X_2=1)) = \mathcal{N}(1.100,\ 1.090)$ , estimated as $\mathcal{N}(1.102,\ 1.069)$ .

For the napkin graph:

True $P(X_4 \mid do(X_3=1)) = \mathcal{N}(0.300,\ 1.160)$ , estimated as $\mathcal{N}(0.305,\ 1.169)$ .

Mean and variance estimates were consistently within a few percent of their true values, demonstrating high-fidelity recovery of causal effects from finite-sample observational data using the CGL-SCM EM algorithm (Maiti et al., 8 Jan 2026).

6. Parameter Reduction and Practical Advantages

CGL-SCMs achieve parameter reduction by standardizing all exogenous variables (confounders and noises) to zero mean and unit variance. This eliminates the latent scaling and location degrees of freedom in general GL-SCMs: the means and variances of $U'$ are removed from the model specification. As a result, the number of free parameters—particularly those associated with unobserved confounders—is drastically reduced. Despite this, the class retains full expressivity over both observational and graph-identifiable interventional distributions. This simplification is particularly advantageous for finite-sample learning, where overparameterization often leads to infeasible or unstable estimation in the presence of unobserved confounding (Maiti et al., 8 Jan 2026).

The EM-based learning algorithm accommodates this streamlined parameterization and enables efficient estimation of edge-weights and bias terms, ensuring that causal queries remain representable and computable in closed form after training.

PDF Markdown Chat (Pro)

References (1)

Estimating Causal Effects in Gaussian Linear SCMs with Finite Data (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Centralized Gaussian Linear SCMs (CGL-SCMs).