PLS-SEM: Robust Latent Variable Modeling

Updated 10 October 2025

PLS-SEM is a statistical method that integrates PLS’s dimension-reduction with SEM to model relationships among latent constructs from high-dimensional, collinear data.
It employs iterative techniques, bootstrapping, and unified measurement and structural models to ensure consistency, convergence, and accurate parameter estimation.
Extensions for ordinal data, dependent observations, and joint clustering enhance PLS-SEM’s adaptability in fields like business, social sciences, and bioinformatics.

Partial Least Squares–Structural Equation Modeling (PLS-SEM) is a class of statistical estimation techniques that combines the dimension-reduction approach of Partial Least Squares (PLS) with path-analytic modeling via structural equation models (SEM). Developed as a robust framework for modeling relationships between latent constructs measured by high-dimensional, possibly collinear, or non-normally distributed indicators, PLS-SEM is widely adopted in diverse fields such as chemometrics, social sciences, business administration, and bioinformatics. Contemporary research has produced precise formulations for PLS in regression, functional data analysis, ordinal variable contexts, and dependent data, and extended SEM to unify latent variable and composite construct modeling. This article details the technical foundations, methodological advances, theoretical properties, practical implementation, and current developments shaping the rigorous application of PLS-SEM.

1. Foundations of PLS Components and Their Role in SEM

PLS regression constructs a sequence of orthogonal latent components from the predictors, each maximizing covariance with the response, subject to deflation and orthogonality constraints. In the functional context, the choice of PLS basis functions is “adapted” to the regression problem, as opposed to principal components, which solely maximize variance in the predictors (Delaigle et al., 2012). The classical functional linear regression model is

$Y = a + \int_I b(t) X(t) dt + \varepsilon$

where $b(t)$ is estimated through expansion in basis functions $\{\psi_j\}$ chosen iteratively by maximizing covariance between residuals and projected predictors, while ensuring $K$ -orthogonality (where $K$ is the covariance operator of $X$ ).

This iterative procedure yields the functional PLS basis via: $\psi_p = c_0 \left( K \left[ b - \sum_{j=1}^{p-1} (b, \psi_j) \psi_j \right] + \sum_{k=1}^{p-1} c_k \psi_k \right)$ which is equivalent to spanning the space generated by $\{ K(b), K^2(b), ..., K^p(b) \}$ , enabling explicit representation of both regression coefficients and predictions (e.g., $g_p(x) = E(Y) + b_p(x - E(X))$ ) (Delaigle et al., 2012).

In SEM, these latent components are used to approximate underlying construct scores (composites), thereby facilitating modeling of complex relationships among latent and observed variables in the presence of high-dimensional or collinear data (Jenatabadi, 2015, Blazère et al., 2014).

2. Model Specification: Measurement and Structural Models

PLS-SEM typically involves two submodels:

Measurement Model: Defines the relations between observed indicators and latent constructs, using either reflective (latents cause indicators) or formative/composite (indicators form the construct) assumptions (Schamberger et al., 8 Aug 2025).
- Latent variable formulation:
$y_i = \lambda_i \eta + \varepsilon_i$ - Composite specification:

$\eta^c = \bm{W}'\bm{y}^c$

The recent unified SEM framework allows both types in a block-diagonal system, yielding a combined model-implied variance-covariance matrix:

$\bm{\Sigma}(\bm{\theta}) = \bm{\Lambda} \, \operatorname{V}(\bm{\eta}) \, \bm{\Lambda}' + \bm{\Theta}$

where loadings $\bm{\Lambda}$ incorporate both latent and composite constructs (Schamberger et al., 8 Aug 2025).

Structural Model: Specifies hypothesized causal relationships between latent constructs, typically represented as:

$\eta = B\eta + \Gamma\xi + \zeta$

where $B$ and $\Gamma$ are coefficient matrices among endogenous and exogenous variables, respectively (Jenatabadi, 2015, Jenatabadi, 2015). In practice, the path coefficients are estimated iteratively, often via bootstrapping to obtain standard errors and significance assessments.

3. Advanced Statistical Properties: Consistency, Convergence, and Residual Formulation

The theoretical analysis of PLS has progressed to explicit formulations of residuals and convergence rates. For linear and functional regression, the PLS residuals can be written in terms of orthogonal polynomials with weights determined by the spectrum of the design matrix and noise projections (Blazère et al., 2014, Val et al., 2023): $\|\hat{\beta}_k - \beta_\text{OLS}\|^2 = \min_{Q_L \in \Omega_L} \sum_d Q_L(\lambda_d)^2 \lambda_d \xi_d^2$ $Q_L$ polynomials are of degree $L$ with $Q_L(0) = -1$ , and $\lambda_d$ are the eigenvalues of the regressor covariance matrix. The Mahalanobis distance between the PLS and OLS estimators quantifies the approximation error, explicitly linking PLS performance to the predictor structure.

Consistency and convergence rates for functional PLS estimators are established: $\|\widehat{g}_p(X_0) - g(X_0)\| = O_p(n^{-1/2}) + o(1),$ with uniformity over a controlled number of components. These results demonstrate that under mild regularity, PLS estimators are consistent and quickly converge as the number of components grows, provided the data spectrum is well-behaved (Delaigle et al., 2012, Blazère et al., 2014, Val et al., 2023).

4. Algorithmic Adaptations: Ordinal Data and Dependence Structures

PLS-SEM has been extended to accommodate ordinal variables (OPLS) and dependent data structures:

Ordinal PLS (OPLS) adjusts for cases where manifest data are ordinal (e.g., Likert scales) by modeling each ordinal variable as arising from an underlying latent-continuous variable with thresholds. The core modification is the replacement of Pearson correlations by polychoric correlations in the estimation routine, yielding more accurate (less biased) parameter estimates when the number of categories is small (Cantaluppi, 2012). The outer approximation and iterative weight updates are performed using the polychoric covariance matrix, ameliorating negative bias in path coefficients common in traditional PLS with ordinal data.
Dependent Data PLS corrects for scenarios where observations are temporally or spatially correlated. The algorithm incorporates the dependence structure via a scaling matrix $V$ (temporal covariance), using estimators

$b(V) = n^{-1} X^\top V^{-2} y,\quad A(V) = n^{-1} X^\top V^{-2} X$

and runs the PLS algorithm on whitened data $V^{-1}X, V^{-1}y$ . This ensures consistency of estimated latent factors even in nonstationary regimes, an essential adjustment whenever PLS is applied to time series, longitudinal, or spatial data (Singer et al., 2015).

5. Clustering and Segmentation: Joint Modeling and Causal Segmentation

Traditional PLS-SEM assumes population homogeneity. Recent advances allow simultaneous estimation of PLS-SEM and segmentation, yielding clusters homogeneous in terms of causal relationships:

PLS-SEM-KM Algorithm: This method jointly estimates the measurement model, the inner structural model, and clusters the data (non-hierarchically) using a reduced K-means on the latent variable scores. Segmentation is performed directly in the latent space shaped by causal relations, rather than clustering after latent score estimation. This approach achieves high adjusted Rand indices (ARI $\approx 0.98$ in simulation studies), outperforms tandem analysis and finite mixture PLS, and enhances group-wise interpretability in applications such as customer satisfaction modeling (Fordellone et al., 2018).

6. Practical Applications and Empirical Impact

PLS-SEM is extensively employed in business administration, economics, management, and technology acceptance modeling. Bibliometric meta-analyses reveal exponential growth in the use of PLS-SEM in business research, with dominant adoption in regions such as Malaysia, Indonesia, and China (Lirio-Loli et al., 2022). Empirical models use PLS-SEM for predictive modeling, handling small sample sizes, ordinal data, collinearity, and non-normal distributions. Applications include digital transformation success measurement (via Balanced Scorecard constructs, e.g., business alignment, operational efficiency, service delivery, strategic outcomes) (O'Higgins, 2023), technology adoption studies (Web 3.0, AI chatbots) (Hizam et al., 2022, Hasan et al., 2023, Mvondo et al., 30 Aug 2024), and high-dimensional biological systems (protein dynamics) (Singer et al., 2015).

Global fit indices, reliability metrics, and path coefficients estimated via PLS-SEM provide actionable diagnostics for practitioners. The method’s flexibility in model specification, missing data handling, and group comparison (enabled by unified composite-latent SEM frameworks) increases its analytical power in multidisciplinary research (Schamberger et al., 8 Aug 2025).

7. Future Directions and Methodological Extensions

Recent work has developed SEM frameworks that fully integrate composites with latent variables, preserving access to robust SEM estimators (ML, GLS), model fit indices, missing data procedures, and multi-group analysis, thus bridging prior limitations in PLS-SEM and composite modeling (Schamberger et al., 8 Aug 2025). For partially linear SEMs (PLSEMs), advances now yield comprehensive characterization of causal identifiability, graphical representations (maximally oriented PDAGs), score-based distribution equivalence estimation, and robust handling of non-Gaussian noise (Rothenhäusler et al., 2016).

Further work emphasizes explicit spectral analysis of PLS estimators, rigorous bounds on estimator approximation error, and hybrid integration with clustering, configurational analysis (fsQCA), and sentiment analytics for multidimensional behavioral modeling (Val et al., 2023, Mvondo et al., 30 Aug 2024). The focus on parsimony, prediction accuracy, and stability under high-dimensionality, collinearity, and non-standard data regimes continues to drive the expansion and refinement of PLS-SEM in both theoretical and applied research.

In summary, PLS-SEM is a mathematically precise, theoretically robust, and widely deployed estimation method for structural equation modeling under complex data conditions. Its ongoing evolution encompasses explicit iterative formulations, rigorous convergence analysis, algorithmic extensions for ordinal and dependent data, integrated clustering, and unification of composite and latent variables, ensuring its relevance and rigorous applicability in contemporary scientific inquiry.