Maximum Softly-Penalised Likelihood Framework

Updated 9 October 2025

The paper introduces a soft penalisation method that prevents pathological boundary estimates (Heywood cases) while preserving asymptotic optimality.
The methodology augments the log-likelihood with penalties that diverge as variances approach zero, ensuring strictly admissible parameter estimates.
Empirical and simulation studies confirm that scaling the penalty (e.g., by n^(-1/2)) significantly improves model inference and factor score reliability.

The maximum softly-penalised likelihood framework is a general methodology for statistical inference in models where traditional maximum likelihood estimation may yield parameter estimates on the boundary of the parameter space, leading to inference failures or pathological behavior. In this framework, one augments the log-likelihood with penalty terms scaled so that estimation is “softly” constrained into the interior of the admissible parameter region—never too harsh to impede optimal asymptotics or equivariance properties, but strong enough that finite sample solutions avoid degenerate regions. This approach has recently been formalized for exploratory factor analysis (Sterzinger et al., 7 Oct 2025), with rigorous guarantees for existence, consistency, and asymptotic normality—provided the penalty scale adapts appropriately with the sample size.

1. Foundations and Motivation

Classical estimation in linear factor models via maximum likelihood maximizes the log-likelihood function:

$\ell(\theta; S) = C - \frac{n}{2}[\log\det(\Lambda \Lambda^{\top} + \Psi) + \mathrm{tr}\left\{ (\Lambda \Lambda^{\top} + \Psi)^{-1}S \right\}],$

where $S$ is the sample covariance matrix, $\Lambda$ is the factor loading matrix, and $\Psi$ is the diagonal matrix of unique variances. However, finite samples or local model misspecification often lead to solutions with some $\Psi_j \leq 0$ or $\Psi_j = 0$ (“Heywood cases”), violating positivity and rendering inference invalid. The softly-penalised likelihood approach mitigates this by adding a penalty $P^*(\theta)$ :

$\ell^*(\theta; S) = \ell(\theta; S) + P^*(\theta),$

with $P^*(\theta)$ tailored to diverge to $-\infty$ when a variance approaches zero, and scaled so that its influence diminishes asymptotically.

2. Heywood Cases and Penalty Construction

Heywood cases refer to exact or ultra-Heywood phenomena: either a unique variance estimate becomes exactly zero (implying perfect factor explanation), or negative (violating variance positivity). Such solutions prevent use of model-based standard errors or confidence intervals, bias factor scores, and may disrupt model selection procedures. The framework requires penalties $P^*(\theta)$ satisfying:

Continuity over the parameter space.
Boundedness from above (no unbounded penalty spikes).
Divergence to $-\infty$ as any $\Psi_j \to 0$ .

Under these conditions (Theorem 1 of (Sterzinger et al., 7 Oct 2025)), maximum penalised likelihood estimates are guaranteed to exist in the interior for all data configurations, ruling out Heywood solutions.

Canonical penalty forms covered include:

Akaike (1987): $P^*(\theta) = -\frac{\rho n}{2} \operatorname{tr}\left(\Psi^{-1/2} S \Psi^{-1/2}\right)$
Hirose et al. (2011): $P^*(\theta) = -\frac{\rho n}{2} \sum_{j} \frac{\Lambda_j^\top \Lambda_j}{\Psi_{jj}}$

Both can be reformulated as penalising $\sum_j A_{jj}(\theta)/\Psi_{jj}$ where $A$ captures either sample or model structure.

3. Scaling for Soft Penalisation

A “soft” penalty ensures that, as $n \to \infty$ , its impact vanishes relative to Fisher information. This is operationalized by choosing a penalty scaling factor $c_n$ such that, for example,

$P^*(\theta) = c_n P(\theta), \;\; c_n = O(n^{-1/2}), \;\; P(\theta) = O_p(1).$

Explicitly, the paper suggests $c_n = \sqrt{2}n^{-1/2}$ or $\rho = 2\sqrt{2}n^{-3/2}$ for the Akaike and Hirose penalties. Under this regimen (see Theorem 3), the penalised estimators retain optimal asymptotic properties— i.e., consistency, asymptotic normality, and proper calibration of inferential procedures.

In contrast, vanilla penalties with fixed $\rho$ introduce excess bias and break the desired asymptotic equivalence to ML estimation.

4. Rigorous Theoretical Guarantees

Under the softly-penalised framework, consistency (in the $\sqrt{n}$ sense) and asymptotic normality of the ML estimator are preserved (Theorems 2-3). The sufficient conditions include uniform convergence of the sample covariance to population, soft scaling of the penalty (so $n^{-1}P^*(\theta_0) \to 0$ ), and regularity conditions ensuring nonsingular Jacobians and stability of parameter rotations.

Crucially, soft penalisation produces parameter estimates strictly inside the admissible parameter space for all samples, and maintains tight control of bias and mean squared error as $n$ increases.

5. Simulation and Empirical Evidence

Simulation studies in the paper contrast:

ML estimation: frequent Heywood cases, sometimes with a large proportion (e.g., over 10%) of samples yielding zero or negative variance estimates for some variables.
Vanilla penalised likelihood ( $\rho n$ scaling): severe finite sample bias, excessive shrinkage.
Softly penalised likelihood ( $n^{-1/2}$ scaling): existence of strictly positive variances in all runs, negligible Heywood occurrence.

Metrics such as probability of underestimation $P(\hat{\lambda}_j < \lambda_j)$ , bias, RMSE, and selection accuracy via AIC/BIC all favor the MSPL approach over alternatives. Empirical evaluations on the Davis, Emmett, and Maxwell datasets demonstrate improved inference stability, factor score reliability, and strictly admissible variance estimates.

6. Model Selection and Extensions

The MSPL framework facilitates valid model selection, as strict positivity of variances assures consistent computation of information criteria like AIC or BIC. Selection of the number of factors using penalised likelihood improves with sample size, and the criteria behave more predictably than under ML or vanilla penalisation.

Prospective extensions include:

Alternative penalty functions (beyond Akaike or Hirose), possibly data-driven.
Confirmatory factor analysis incorporating known constraints.
Categorical factor models, where boundary issues also arise.
Formal hypothesis testing under strict interior solutions.

7. Conclusion and Future Outlook

Maximum softly-penalised likelihood in factor analysis provides a rigorous and practically valuable solution to boundary estimation pathologies, specifically Heywood cases. By adapting penalty scaling to the sample size, it ensures technically robust estimators with asymptotic optimality in the sense of consistency and normality. Simulation and real data analysis confirm its empirical advantages. The framework is extensible to other latent variable models where interior solutions and inferential validity are critical, and suggests a principled pathway for penalisation strategies in high-dimensional and constrained estimation settings (Sterzinger et al., 7 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

Maximum softly penalised likelihood in factor analysis (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Maximum Softly-Penalised Likelihood Framework.