Partial Least Squares Structural Equation Modeling

Updated 7 December 2025

PLS-SEM is a variance-based modeling technique that estimates relationships among latent constructs using both reflective and formative approaches.
It employs a two-stage, iterative algorithm to optimize measurement and structural models, maximizing explained variance and predictive relevance.
PLS-SEM is ideal for complex models, small-to-moderate sample sizes, and non-normal data, with applications in digital transformation and technology adoption.

Partial Least Squares Structural Equation Modeling (PLS-SEM) is a variance-based structural equation modeling methodology that enables simultaneous estimation of relationships among latent constructs and their manifest indicators, while maximizing the explained variance of endogenous variables. Distinct from covariance-based SEM, PLS-SEM is particularly suited for complex models, small-to-moderate sample sizes, non-normal indicator distributions, and theory development or prediction-focused research contexts.

1. Theoretical Foundations and Model Structure

PLS-SEM decomposes models into two subcomponents:

Outer (Measurement) Model: Specifies how manifest variables (indicators) relate to latent variables (LVs). Reflective models assume the LV causes the observed measures, specified as $X_{pk} = \alpha_{pk} + \lambda_{pk}\xi_k + \varepsilon_{pk}$ . Formative models define the LV as a linear combination of its measurements, as $\xi_k = \delta_{0k} + \sum_{p=1}^{P_k} \delta_{pk}X_{pk} + \zeta_k$ (Giuseppe et al., 2022).
Inner (Structural) Model: Specifies hypothesized causal relationships among LVs, typically in the form $\xi_j = \beta_{j0} + \sum_{k\in \mathrm{Pred}(j)}\beta_{jk}\xi_k + \eta_j$ , where $\beta_{jk}$ are path coefficients (Giuseppe et al., 2022, Hizam et al., 2022).

PLS-SEM maximizes the explained variance ( $R^2$ ) of endogenous LVs, supports both reflective and formative blocks, and operates under minimal distributional assumptions—no requirement for multivariate normality or large samples (O'Higgins, 2023).

2. Estimation Algorithms and Computational Workflow

The canonical PLS-SEM estimation follows Wold’s path-modeling iteration:

Initialization: Standardize indicators, and assign unit or correlation-based weights to indicators within each LV block (Hizam et al., 2022, O'Higgins, 2023).
Outer Approximation (Measurement Model): Compute weighted LV scores as $ŷ_k = \sum w_{kj}x_j$ . Update weights iteratively, typically as $w^{(t+1)}_{kj} \propto \operatorname{cov}(x_j, ŷ_k)/\operatorname{var}(x_j)$ (Hizam et al., 2022).
Inner Approximation (Structural Model): Generate “inner” latent variable estimates as linear combinations of neighboring LV scores, per the model’s path structure (e.g., $u_j = \sum_k b_{jk}t_k$ ) (O'Higgins, 2023). Estimate path coefficients via ordinary least squares regression on the LV scores.
Weight Updating and Convergence: Iterate the above steps until the change in weights is less than a pre-specified tolerance.

This two-stage algorithm enables estimation of both model parameters and LV scores simultaneously, enabling subsequent resampling-based inference and prediction (Hasan et al., 2023).

3. Model Evaluation: Measurement and Structural Components

Rigorous model assessment encompasses both measurement and structural aspects:

Indicator Reliability: Outer loadings $\lambda_{ij} \geq 0.70$ are preferred, but values $0.50-0.70$ may be retained if overall construct validity is established (Hizam et al., 2022). All variance inflation factors (VIFs) should be $<5$ to rule out multicollinearity.
Convergent Validity: Average Variance Extracted (AVE)— $AVE_k = (\sum \lambda_{ki}^2)/m$ —must exceed $0.50$ (O'Higgins, 2023).
Construct Reliability: Cronbach's $\alpha$ $(\alpha > 0.70)$ and composite reliability measures (e.g., $CR_k$ ) evaluate internal consistency (Hasan et al., 2023).
Discriminant Validity: The Fornell–Larcker criterion ( $\sqrt{AVE_j} > \max_{k \neq j} \mathrm{Corr}(\eta_j, \eta_k)$ ), cross-loadings, and the Heterotrait–Monotrait Ratio (HTMT; $<$ 0.85 or 0.90) confirm construct distinctiveness (O'Higgins, 2023, Hizam et al., 2022).
Structural Model Assessment: Path coefficients ( $\beta$ ), associated $t$ -values (based on $5000$ bootstrap samples), and $p$ -values are reported for hypothesis significance (Hizam et al., 2022, Hasan et al., 2023). Explanatory power is indicated by $R^2$ statistics and effect sizes $f^2$ (O'Higgins, 2023).
Predictive Relevance: Out-of-sample forecast metrics such as $Q^2$ (via blindfolding or PLSPredict), root mean square error (RMSE), and mean absolute error (MAE) against a linear model benchmark are reported (Hizam et al., 2022).

4. Special Topics: Segmentation, Ordinal Data, and Cyclical Causality

Simultaneous Clustering and PLS-SEM

PLS-SEM-KM integrates K-means clustering with the PLS-SEM algorithm, optimizing both cluster assignments and SEM parameters jointly. Unlike sequential workflows (PLS→clustering), the simultaneous approach produces clusters homogeneous with respect to structural relationships, enhancing segment validity—confirmed empirically by ARI improvements and simulation benchmarks against FIMIX-PLS (Fordellone et al., 2018).

Ordinal Partial Least Squares (OPLS)

Traditional PLS-SEM is suboptimal for ordinal data with few categories (e.g., $I=4,5$ ). OPLS addresses this by employing a polychoric correlation substitution for the Pearson covariance matrix in all algorithmic computations. This adjustment substantially reduces negative bias in path coefficient estimation in small-category ordinal settings. For $I \geq 7$ , OPLS converges to standard PLS-SEM estimates (Cantaluppi, 2012).

Modeling Cyclic (Reciprocal) Effects

Standard SEM prohibits cyclic paths. A two-step approach with PLS-SEM enables modeling reciprocal causality in cross-sectional data: first estimate the acyclic model, then re-specify models with feedback paths using LV scores from Step 1 as proxies for 'lagged' values. Bootstrap-based parametric tests compare the strength of forward and cyclic effects (Giuseppe et al., 2022). This technique enabled demonstration that internet usage intensity both results from and reinforces digital skills and physical access.

5. Power Analysis and Sample Size Planning

Statistical power is critical for PLS-SEM paper design. The inverse square root method is the current standard:

Required Sample Size:

$N = (p_\alpha / \beta_{\min})^2,$

where $p_\alpha$ corresponds to the chosen significance threshold (e.g., $p_\alpha=2.486$ at $\alpha=0.05$ for 80% power), and $\beta_{\min}$ is the smallest effect size of interest. For $\beta_{\min}=0.5$ , $N=25$ (Ansani et al., 18 Nov 2025).

Minimum Detectable Effect Size (MDES):

$\beta_{\min}=p_\alpha / \sqrt{N}$ . For $N=68$ , $\beta_{\min} \approx 0.30$ .

Use of the "PLS-SEM-power" R package and Shiny application operationalizes this process for both a priori sample size determination and post hoc sensitivity analysis (Ansani et al., 18 Nov 2025). The method assumes one path at a time and requires $N>10$ for valid application.

6. Empirical Applications and Practical Workflow

Empirical PLS-SEM studies typically adhere to the following protocol (Hizam et al., 2022, O'Higgins, 2023, Hasan et al., 2023):

Develop and operationalize a model with constructs and indicators (using Likert or similar scales).
Screen data and specify reflective/formative measurement blocks.
Apply the PLS algorithm (SmartPLS, R’s seminr, or equivalent).
Evaluate measurement validity (AVE, $\alpha$ , CR, HTMT) and structural relationships ( $\beta$ , $R^2$ , $f^2$ ).
Test hypotheses through bootstrapping.
Conduct out-of-sample predictive validation (PLS-Predict, $Q^2$ , RMSE, MAE).
Interpret significant and non-significant effects, with attention to the largest effect sizes and practical relevance.
Report statistical power analysis, indicating both a priori planning and actual sensitivity achieved (Ansani et al., 18 Nov 2025).

Empirical studies illustrate these steps in diverse domains, including technology adoption (Hizam et al., 2022), digital transformation (O'Higgins, 2023), and educational technology (Hasan et al., 2023). For ordinal data, OPLS should be selected for improved bias properties (Cantaluppi, 2012). For population heterogeneity or market segmentation, joint PLS-SEM-KM estimation is preferable (Fordellone et al., 2018). For causal feedback, iterative two-step PLS-SEM estimation is needed (Giuseppe et al., 2022).

7. Methodological Innovations and Adaptations

Recent methodological advances include:

Introduction of simultaneous clustering within PLS-SEM (PLS-SEM-KM) to address unobserved heterogeneity (Fordellone et al., 2018).
Ordinal PLS adaptation for manifest measures with few ordered categories, utilizing polychoric correlations and latent threshold modeling (Cantaluppi, 2012).
Two-stage modeling of cyclic effects with cross-sectional data, enabling quantification of feedback mechanisms in socio-technical systems (Giuseppe et al., 2022).
Shiny/R tools for power analysis, automating computation of required $N$ and MDES via the inverse square root method (Ansani et al., 18 Nov 2025).
Application of advanced validation diagnostics—including $Q^2$ and out-of-sample RMSE/MAE—for robust predictive assessment (O'Higgins, 2023, Hizam et al., 2022).

These innovations are anchored in systematic empirical testing and are integrated in current best-practice PLS-SEM workflows.

For technical implementation and further detail, consult workflow exemplars and software scripts as reported in domain applications (Hizam et al., 2022, O'Higgins, 2023, Hasan et al., 2023).