Partial Least Squares Structural Equation Modeling
- PLS-SEM is a variance-based modeling technique that estimates relationships among latent constructs using both reflective and formative approaches.
- It employs a two-stage, iterative algorithm to optimize measurement and structural models, maximizing explained variance and predictive relevance.
- PLS-SEM is ideal for complex models, small-to-moderate sample sizes, and non-normal data, with applications in digital transformation and technology adoption.
Partial Least Squares Structural Equation Modeling (PLS-SEM) is a variance-based structural equation modeling methodology that enables simultaneous estimation of relationships among latent constructs and their manifest indicators, while maximizing the explained variance of endogenous variables. Distinct from covariance-based SEM, PLS-SEM is particularly suited for complex models, small-to-moderate sample sizes, non-normal indicator distributions, and theory development or prediction-focused research contexts.
1. Theoretical Foundations and Model Structure
PLS-SEM decomposes models into two subcomponents:
- Outer (Measurement) Model: Specifies how manifest variables (indicators) relate to latent variables (LVs). Reflective models assume the LV causes the observed measures, specified as . Formative models define the LV as a linear combination of its measurements, as (Giuseppe et al., 2022).
- Inner (Structural) Model: Specifies hypothesized causal relationships among LVs, typically in the form , where are path coefficients (Giuseppe et al., 2022, Hizam et al., 2022).
PLS-SEM maximizes the explained variance () of endogenous LVs, supports both reflective and formative blocks, and operates under minimal distributional assumptions—no requirement for multivariate normality or large samples (O'Higgins, 2023).
2. Estimation Algorithms and Computational Workflow
The canonical PLS-SEM estimation follows Wold’s path-modeling iteration:
- Initialization: Standardize indicators, and assign unit or correlation-based weights to indicators within each LV block (Hizam et al., 2022, O'Higgins, 2023).
- Outer Approximation (Measurement Model): Compute weighted LV scores as . Update weights iteratively, typically as (Hizam et al., 2022).
- Inner Approximation (Structural Model): Generate “inner” latent variable estimates as linear combinations of neighboring LV scores, per the model’s path structure (e.g., ) (O'Higgins, 2023). Estimate path coefficients via ordinary least squares regression on the LV scores.
- Weight Updating and Convergence: Iterate the above steps until the change in weights is less than a pre-specified tolerance.
This two-stage algorithm enables estimation of both model parameters and LV scores simultaneously, enabling subsequent resampling-based inference and prediction (Hasan et al., 2023).
3. Model Evaluation: Measurement and Structural Components
Rigorous model assessment encompasses both measurement and structural aspects:
- Indicator Reliability: Outer loadings are preferred, but values $0.50-0.70$ may be retained if overall construct validity is established (Hizam et al., 2022). All variance inflation factors (VIFs) should be to rule out multicollinearity.
- Convergent Validity: Average Variance Extracted (AVE)——must exceed $0.50$ (O'Higgins, 2023).
- Construct Reliability: Cronbach's and composite reliability measures (e.g., ) evaluate internal consistency (Hasan et al., 2023).
- Discriminant Validity: The Fornell–Larcker criterion (), cross-loadings, and the Heterotrait–Monotrait Ratio (HTMT; 0.85 or 0.90) confirm construct distinctiveness (O'Higgins, 2023, Hizam et al., 2022).
- Structural Model Assessment: Path coefficients (), associated -values (based on $5000$ bootstrap samples), and -values are reported for hypothesis significance (Hizam et al., 2022, Hasan et al., 2023). Explanatory power is indicated by statistics and effect sizes (O'Higgins, 2023).
- Predictive Relevance: Out-of-sample forecast metrics such as (via blindfolding or PLSPredict), root mean square error (RMSE), and mean absolute error (MAE) against a linear model benchmark are reported (Hizam et al., 2022).
4. Special Topics: Segmentation, Ordinal Data, and Cyclical Causality
Simultaneous Clustering and PLS-SEM
PLS-SEM-KM integrates K-means clustering with the PLS-SEM algorithm, optimizing both cluster assignments and SEM parameters jointly. Unlike sequential workflows (PLS→clustering), the simultaneous approach produces clusters homogeneous with respect to structural relationships, enhancing segment validity—confirmed empirically by ARI improvements and simulation benchmarks against FIMIX-PLS (Fordellone et al., 2018).
Ordinal Partial Least Squares (OPLS)
Traditional PLS-SEM is suboptimal for ordinal data with few categories (e.g., ). OPLS addresses this by employing a polychoric correlation substitution for the Pearson covariance matrix in all algorithmic computations. This adjustment substantially reduces negative bias in path coefficient estimation in small-category ordinal settings. For , OPLS converges to standard PLS-SEM estimates (Cantaluppi, 2012).
Modeling Cyclic (Reciprocal) Effects
Standard SEM prohibits cyclic paths. A two-step approach with PLS-SEM enables modeling reciprocal causality in cross-sectional data: first estimate the acyclic model, then re-specify models with feedback paths using LV scores from Step 1 as proxies for 'lagged' values. Bootstrap-based parametric tests compare the strength of forward and cyclic effects (Giuseppe et al., 2022). This technique enabled demonstration that internet usage intensity both results from and reinforces digital skills and physical access.
5. Power Analysis and Sample Size Planning
Statistical power is critical for PLS-SEM paper design. The inverse square root method is the current standard:
- Required Sample Size:
where corresponds to the chosen significance threshold (e.g., at for 80% power), and is the smallest effect size of interest. For , (Ansani et al., 18 Nov 2025).
- Minimum Detectable Effect Size (MDES):
. For , .
Use of the "PLS-SEM-power" R package and Shiny application operationalizes this process for both a priori sample size determination and post hoc sensitivity analysis (Ansani et al., 18 Nov 2025). The method assumes one path at a time and requires for valid application.
6. Empirical Applications and Practical Workflow
Empirical PLS-SEM studies typically adhere to the following protocol (Hizam et al., 2022, O'Higgins, 2023, Hasan et al., 2023):
- Develop and operationalize a model with constructs and indicators (using Likert or similar scales).
- Screen data and specify reflective/formative measurement blocks.
- Apply the PLS algorithm (SmartPLS, R’s seminr, or equivalent).
- Evaluate measurement validity (AVE, , CR, HTMT) and structural relationships (, , ).
- Test hypotheses through bootstrapping.
- Conduct out-of-sample predictive validation (PLS-Predict, , RMSE, MAE).
- Interpret significant and non-significant effects, with attention to the largest effect sizes and practical relevance.
- Report statistical power analysis, indicating both a priori planning and actual sensitivity achieved (Ansani et al., 18 Nov 2025).
Empirical studies illustrate these steps in diverse domains, including technology adoption (Hizam et al., 2022), digital transformation (O'Higgins, 2023), and educational technology (Hasan et al., 2023). For ordinal data, OPLS should be selected for improved bias properties (Cantaluppi, 2012). For population heterogeneity or market segmentation, joint PLS-SEM-KM estimation is preferable (Fordellone et al., 2018). For causal feedback, iterative two-step PLS-SEM estimation is needed (Giuseppe et al., 2022).
7. Methodological Innovations and Adaptations
Recent methodological advances include:
- Introduction of simultaneous clustering within PLS-SEM (PLS-SEM-KM) to address unobserved heterogeneity (Fordellone et al., 2018).
- Ordinal PLS adaptation for manifest measures with few ordered categories, utilizing polychoric correlations and latent threshold modeling (Cantaluppi, 2012).
- Two-stage modeling of cyclic effects with cross-sectional data, enabling quantification of feedback mechanisms in socio-technical systems (Giuseppe et al., 2022).
- Shiny/R tools for power analysis, automating computation of required and MDES via the inverse square root method (Ansani et al., 18 Nov 2025).
- Application of advanced validation diagnostics—including and out-of-sample RMSE/MAE—for robust predictive assessment (O'Higgins, 2023, Hizam et al., 2022).
These innovations are anchored in systematic empirical testing and are integrated in current best-practice PLS-SEM workflows.
For technical implementation and further detail, consult workflow exemplars and software scripts as reported in domain applications (Hizam et al., 2022, O'Higgins, 2023, Hasan et al., 2023).