Structural Equation Models Overview

Updated 27 November 2025

Structural equation models are multivariate methods that quantify complex relationships among observed and latent variables using measurement and structural components.
They employ directed graphs and algebraic constraints to ensure identifiability, enabling rigorous estimation of causal effects and dependencies.
Recent advances incorporate regularization techniques and nonlinear extensions, enhancing estimation accuracy and scalability in high-dimensional settings.

Structural equation models (SEMs) are a class of multivariate statistical models that represent complex systems of relationships among variables, often integrating both measurement and structural components. SEMs permit rigorous modeling of dependencies, causality, measurement error, latent constructs, and—depending on the formulation—nonlinear and high-dimensional effects. Theoretical developments span identifiability, parameter estimation, algebraic representation, regularization, and extensions for domain-specific applications in social, biological, engineering, and causal inference contexts.

1. Formalism and Model Classes

A structural equation model specifies a set of random variables $X_1, \dots, X_p$ , each defined as a function of other variables and a stochastic noise term. The most general form on observed variables for $p$ process characteristics is

$X_{\pi^0(\ell)} = c_{\pi^0(\ell)} + \sum_{k: \pi^0(k) < \pi^0(\ell)} f_{k, \pi^0(\ell)}(X_k) + \epsilon_{\pi^0(\ell)}, \quad \ell=1,\dots,p,\quad \epsilon_{\pi^0(\ell)} \sim N(0, \sigma^2_{\pi^0(\ell)}),$

with directed edges $X_k \to X_\ell$ if $f_{k,\ell} \not\equiv 0$ and with no explicit latent variables (Kertel et al., 2022). In classical settings, SEMs partition variables into observed and latent constructs, with the measurement model

$\mathbf{y} = \Lambda \eta + \varepsilon, \quad \varepsilon \sim N(0, \Theta),$

and the structural model

$\eta = B \eta + \zeta, \quad \zeta \sim N(0, \Psi),$

where $\mathbf{y}$ are observed indicators, $\eta$ are latent factors, $\Lambda$ are loadings, $B$ encodes structural relations among latents, and $\Psi$ their covariance (Zheng, 30 Mar 2025).

Directed acyclic graphs (DAGs) or mixed graphs (with bidirected edges for correlated errors) encode the structure. The covariance matrix of the observed variables is parameterized as

$\Sigma = \Lambda (I - B)^{-1} \Psi (I - B)^{-T} \Lambda^T + \Theta,$

and estimation proceeds by matching this to the sample covariance via likelihood or alternative loss functions.

Nonlinear and non-Gaussian SEMs generalize this setup, placing nonparametric (e.g., spline or kernel) functions on each directed edge, introducing flexible dependency forms while maintaining identifiability under certain conditions (Kertel et al., 2022, Shen et al., 2016, Silva et al., 2010).

2. Identifiability, Graph Theory, and Algebraic Aspects

Identifiability is foundational: the parametrization from SEM coefficients to the covariance (or precision) matrix must be injective (globally or generically), else parameter estimates are not unique. For linear Gaussian SEMs associated to a mixed acyclic graph $G = (V, D, B)$ , identifiability is characterized by Drton–Foygel–Sullivant’s theorem: $\phi_G$ is injective if and only if no subset $A$ of nodes induces both a directed arborescence (tree) and a connected bidirected graph (Drton et al., 2010). Algebraic matroid techniques provide combinatorial identifiability conditions, including for cyclic and homoscedastic models, via Jacobian rank tests and parentally closed sets (Drton et al., 2023).

The algebraic map from SEM parameters to the covariance matrix is rational, with the structure reflected combinatorially in the trek rule (expressing $\Sigma_{ij}$ as sums over treks in the graph) (Drton, 2016). The one-connection rule generalizes inversion formulas for $(I-\Lambda)^{-1}$ to cyclic graphs, realizing explicit, universal rational expressions for $\Sigma$ entries, and guaranteeing generic finite-to-one parameterization in simple mixed graphs (Adhikari et al., 2022).

Table: Algebraic identifiability criteria

Reference	Model Type	Graph Condition	Identifiability
(Drton et al., 2010)	Linear acyclic mixed SEM	No arborescence + bidirected component	Global
(Drton et al., 2023)	Linear (homoscedastic, cyclic)	Distinct out-degree or parentally closed	Generic
(Adhikari et al., 2022)	General linear (mixed, cyclic)	Simple graph (at most 1 edge per pair)	Generic, rational

Conditional-independence and higher-order algebraic constraints (vanishing minors, Verma constraints) further distinguish non-isomorphic models via the structure of the Gaussian vanishing ideal (Drton, 2016, Adhikari et al., 2022).

3. Estimation, Regularization, and Computational Methods

Parameter estimation in SEMs is classically via maximum likelihood, minimizing

$F_{ML} = \log|\Sigma(\theta)| + \operatorname{tr}(S \Sigma(\theta)^{-1}) - \log|S| - p,$

where $S$ is the sample covariance. Alternative approaches include generalized least squares, robust estimators (e.g., Satorra-Bentler scaled $\chi^2$ ), and full-information likelihood for missing data.

Recent advances emphasize regularization:

Penalized likelihood with $\ell_1$ (lasso), ridge, elastic net, adaptive lasso, SCAD, MCP, or spike-and-slab penalties supports variable selection, shrinks weak effects, and improves performance in high $p/n$ settings (Jacobucci, 2017, Pruttiakaravanich et al., 2018, Kesteren et al., 2019).
Proximal gradient, ADMM, and quasi-Newton algorithms efficiently solve penalized and/or convex SEM estimation problems. Convex relaxations relax the nonlinear equality constraints to linear matrix inequalities, yielding tractable semidefinite programs with low-rank guarantees (Pruttiakaravanich et al., 2018).
Nonparametric methods including kernel-based SEMs estimate nonlinear causal networks via convex group-sparse regularization, with ADMM or proximal-splitting solvers (Shen et al., 2016).

For high-dimensional data, modern computation-graph representations (e.g., via TensorFlow) enable automatic differentiation and Adam or momentum-based optimization for arbitrary loss constructions—least absolute deviation, penalized ML, or combinations (Kesteren et al., 2019).

4. Extensions: Nonlinear, Latent, Mixture, and Heteroscedastic SEMs

SEM architectures have expanded to include:

Nonlinear models: Additive nonlinearities estimated by splines (Kertel et al., 2022), kernel methods (Shen et al., 2016), or full Bayesian Gaussian process structures between latents (Silva et al., 2010, Silva et al., 2014). These models relax linearity while maintaining or exploiting identifiability, with hierarchical MCMC or variational inference for estimation.
Latent and composite constructs: SEMs with both reflective (latent variable) and formative (composite) blocks unify classic factor-analytic and composite models, with joint ML or GLS estimation and full handling of fit indices and missing data (Schamberger et al., 8 Aug 2025).
Mixture and non-Gaussian models: Mean-field variational Bayes for mixture-of-Gaussians SEMs accommodates skewed, heavy-tailed, or multimodal observed variables with closed-form coordinate updates and variational information criteria for model selection (Dang et al., 11 Jul 2024).
Distributional SEMs: Bayesian frameworks where both means and variances of latent variables are themselves modeled as functions of other latent variables (“distributional SEM”), enabling explicit modeling of latent heteroscedasticity and supporting simulation-calibrated inference (Fazio et al., 22 Apr 2024).
Empirical likelihood: Distribution-free inference for the parameter-rich, possibly non-Gaussian SEM with dependent errors, leveraging convex profile empirical likelihood and moment constraints on means and covariances for higher statistical efficiency and better coverage than standard MLE under non-normality (Wang et al., 2017).

5. Causal and Application-Specific Modeling

Causal graph learning with SEMs is widely applied in domains where a natural process order or constraints are encoded. For instance, TCAM models in manufacturing leverage temporal tiering and forbidden-edge constraints for high-throughput causal discovery:

Prior knowledge is encoded via ordered production tiers (allowing only edges $t(k) \leq t(\ell)$ ) and explicit forbidden edge matrices.
Additive nonlinearity is modeled via regression splines fitted in generalized additive models.
A two-stage algorithm combines penalized neighborhood selection (lasso regression) and greedy node ordering/pruning for robust, computationally competitive discovery of causal effects in high-dimensional industrial data (Kertel et al., 2022).

In social and behavioral sciences, SEMs integrate measurement error, mediation, and multi-group or longitudinal designs for complex inference, and provide explicit causal counterfactual modeling when supplemented with potential outcomes or cross-lagged structures (Zheng, 30 Mar 2025). Consistency of causal effects under model reductions and marginalizations is formally captured by exact transformations between SEMs, supporting interpretations of cyclic models as equilibrium limits of underlying acyclic processes (Rubenstein et al., 2017).

6. Model Evaluation, Fit, and Best Practices

Model fit is typically evaluated through a battery of criteria:

The exact-fit $\chi^2$ test, degrees of freedom, and associated $p$ -value;
Approximate fit indices, especially RMSEA, CFI, TLI, and SRMR;
Information criteria and cross-validation for penalized and mixture models;
Empirical coverage via bootstrap or jackknife corrections, especially in variational approximations (Dang et al., 2021).

Best practices emphasize pre-specification of structure, careful identification checks, and using modifications (e.g., Lagrange or Wald tests) only when theoretically justified. For estimator choice, match robust alternatives to data properties and report comprehensive fit diagnostics (Zheng, 30 Mar 2025). In penalized settings, cross-validated or information-criterion-tuned regularization parameters are required to ensure valid inference (Jacobucci, 2017, Pruttiakaravanich et al., 2018).

7. Theoretical Advances and Future Directions

SEMs continue to advance in several theoretical and practical dimensions:

Algebraic and combinatorial identifiability criteria now extend to cyclic and non-Gaussian systems, with universal rational parameterizations now available for simple graphs (Adhikari et al., 2022).
Nonlinear and nonparametric extensions, both fully Bayesian and variational, offer flexibility while remaining computationally tractable (Silva et al., 2010, Silva et al., 2014, Kertel et al., 2022).
Convex relaxation and regularization techniques enable scaling to high-dimensional or complex networked systems (Pruttiakaravanich et al., 2018, Shen et al., 2016).
Domain-specific extensions (e.g., joint latent/composite models, mixture structures, distributional SEMs) broaden the applicability of SEMs in applied sciences and industry (Schamberger et al., 8 Aug 2025, Dang et al., 11 Jul 2024, Fazio et al., 22 Apr 2024).
Practical tools leveraging automatic differentiation and computation graphs accelerate research and prototyping of novel SEM structures (Kesteren et al., 2019).

Current research focuses on broader identifiability criteria, improved computational scalability, robust inference under complex dependence structures, and domain-adapted SEM formulations for real-world decision-making and causal discovery.