Weighted Likelihood Estimator (WLE)

Updated 9 April 2026

WLE is a robust estimation framework that assigns data-dependent weights to observations, enhancing resistance to outliers while retaining asymptotic efficiency.
It employs various weighting strategies including residual-based adjustments and data depth methods, generalizing MLE across regression, mixture, and multivariate models.
Iterative algorithms like IRLS and weighted EM, combined with divergence minimization, ensure computational feasibility and reliable performance in practice.

A Weighted Likelihood Estimator (WLE) is a statistical estimation framework in which each observation's contribution to the likelihood or score equations is modulated by a data-dependent weight. The primary motivation is to attain robust, outlier-resistant inferential procedures that maintain full asymptotic efficiency under the assumed model. The WLE paradigm generalizes the classical Maximum Likelihood Estimator (MLE), is applicable to a broad range of models—including multivariate location and scatter, finite mixture models, and regression—and encompasses weighting schemes based on kernel density ratios, statistical data depth, and other residual-based diagnostics. Weighted likelihood procedures have been extensively studied both from the frequentist large-sample perspective and within the theory of convergence under weighted sampling, as well as in computational algorithms for practical implementation.

1. Theoretical Basis of Weighted Likelihood Estimation

The general WLE framework considers a sample $X_1,\ldots,X_n$ from a model family $\{f(x;\theta): \theta\in\Theta\}$ and attaches a weight $w_i\in[0,1]$ to each observation $X_i$ . The weighted log-likelihood is

$\ell_w(\theta) = \sum_{i=1}^n w_i \log f(X_i; \theta).$

The WLE is any solution $\hat\theta_w$ to the weighted score equations

$U_w(\theta) = \nabla_\theta \ell_w(\theta) = \sum_{i=1}^n w_i u_\theta(X_i) = 0,$

where $u_\theta(x)$ is the usual score function. When $w_i \equiv 1$ , the WLE reduces to the ordinary MLE (Majumder et al., 2016, Broniatowski et al., 2012).

From a large deviations and divergence-minimization viewpoint, weighted sampling corresponds to minimizing a specific $\varphi$ -divergence between the parametric model and the empirical weighted measure. The associated divergence is determined by the cumulant generating function of the weight distribution, and the minimum-divergence estimator is equivalent to the WLE under mild regularity (Broniatowski et al., 2012). In exponential family models, the WLE admits explicit representation as a function of the weighted empirical mean of sufficient statistics, leading in special cases to families of generalized means such as the Lehmer and Hölder means (Ziou, 2023, Ziou et al., 2024).

2. Weight Construction: Residuals, Depths, and Adjustment Functions

Weight specification is the core design aspect. Strategies include:

Pearson-Residual-Based Weights:

Weights are constructed via residuals comparing a univariate statistic's empirical (often kernel-estimated) density $\{f(x;\theta): \theta\in\Theta\}$ 0 to a reference model (e.g. $\{f(x;\theta): \theta\in\Theta\}$ 1, normal), typically through

$\{f(x;\theta): \theta\in\Theta\}$ 2

with $\{f(x;\theta): \theta\in\Theta\}$ 3 the model or smoothed model density. A Residual Adjustment Function (RAF) $\{f(x;\theta): \theta\in\Theta\}$ 4, such as the Hellinger or power-divergence, is applied and the weight is

$\{f(x;\theta): \theta\in\Theta\}$ 5

with $\{f(x;\theta): \theta\in\Theta\}$ 6. This approach is computationally feasible in high dimensions as only a univariate kernel smooth is required—the curse of dimensionality is thus avoided (Agostinelli et al., 2017, Greco et al., 2018).

Statistical Data Depth Based Weights:

Another methodology uses affine-invariant data depths (e.g., half-space depth) to define for each point the empirical depth $\{f(x;\theta): \theta\in\Theta\}$ 7 relative to the sample and the model depth $\{f(x;\theta): \theta\in\Theta\}$ 8. A depth-based residual, such as

$\{f(x;\theta): \theta\in\Theta\}$ 9

for $w_i\in[0,1]$ 0, measures compatibility. A weight function $w_i\in[0,1]$ 1 (e.g., a tapered or piecewise-linear function) transforms this residual: $w_i\in[0,1]$ 2 so that weights rapidly decrease for deviant points (Agostinelli et al., 22 Jul 2025, Agostinelli, 2018). This yields affine equivariance, model-adaptiveness, and high breakdown for scatter/location estimation in multivariate models.

Alternative Constructions:

In regression and loss modeling contexts, $w_i\in[0,1]$ 3 can be analytically chosen to exponentially downweight extreme or tail observations, adapting to the error structure or empirical distribution (Fung, 2022, Fung, 2021).
For exponential families, arbitrary positive weights $w_i\in[0,1]$ 4 enable interpretation as generalized weighted means (Ziou, 2023, Ziou et al., 2024).

3. Algorithms and Computational Implementation

The WLE frequently leads to non-linear estimating equations requiring iterative solution:

IRLS-Type Schemes:

For estimating multivariate location and scatter, or robust principal components, an iteratively reweighted least squares (IRLS) procedure alternates updating weights (via RAF or depth-based measures) and parameter estimates until convergence. A typical update step for multivariate normal:

$w_i\in[0,1]$ 5

with $w_i\in[0,1]$ 6 an unbiasedness correction (Agostinelli et al., 2017).

Weighted EM and CEM for Mixture Models:

In Gaussian mixtures and related models, the E-step remains standard, while the M-step employs current weights to yield robust parameter updates. Outlier detection and model selection proceed via weight-based or robust distance-based rules (Greco et al., 2018, Fung, 2021).

Regression and GLM Settings:

SWLE in GLMs is solved by weighted IRLS per fixed dispersion, and then root-finding in the dispersion parameter, with many closed-form updates and direct extension to truncated/censored likelihoods (Fung, 2022).

Multiple Roots and Initialization:

Depth-based or high-breakdown algorithms can exhibit multiple roots, particularly in data with hidden subclusters. Multiple random or deterministic initializations, followed by root selection using fitted tail weights or cross-validated likelihood, mitigate this risk (Agostinelli et al., 2017, Agostinelli et al., 22 Jul 2025).

4. Robustness, Efficiency, and Breakdown Properties

Key properties are as follows:

Model Efficiency:

When the sample comes from the model, weights converge to 1 in probability (or the mean of the weight law for weighted sampling), so the WLE is asymptotically equivalent to the MLE with the same variance—first-order efficiency is preserved (Majumder et al., 2016, Broniatowski et al., 2012, Agostinelli et al., 22 Jul 2025).

Robustness and Breakdown:

If data depart from the model (e.g., outliers, contamination), the weights sharply decrease for deviant points, bounding their influence. For elliptical models, the finite-sample breakdown point for location/scatter can reach $w_i\in[0,1]$ 7 or $w_i\in[0,1]$ 8, matching best-known equivariant estimators (Agostinelli et al., 22 Jul 2025, Agostinelli et al., 2017).

Consistency and Influence Functions:

Provided standard regularity (differentiability, identifiability, bounded information), consistency and asymptotic normality hold. First-order influence functions of WLE at the model coincide with those of the MLE, but second-order bias and breakdown behavior are uniformly bounded when down-weighting is active (Majumder et al., 2016, Agostinelli et al., 22 Jul 2025).

Variance and Sandwich Estimation:

The covariance of the WLE is either the Fisher information at the model or a sandwich form at contaminated distributions, and in typical robustification settings there is no inflation relative to the MLE when uncontaminated (Saraceno et al., 2020).

5. Applications in Multivariate, Mixture, and Regression Models

Multivariate Location and Scatter:

WLE replaces the sample mean and covariance by weighted analogs, with outlier-resistant behavior and direct usages in robust principal component analysis (PCA), outlier detection via distance-based thresholds, and affine equivariance (Agostinelli et al., 2017).

Mixture Models and Model-Based Clustering:

Weighted likelihood versions of EM and CEM algorithms yield robust parameter estimates and cluster assignments. The WLE is bounded-influence and features consistent model selection via weighted BIC. Both truncated likelihood and flexible weight families target heavy-tailed phenomena and outlier profiles in insurance or heavy-tail mixtures (Greco et al., 2018, Fung, 2021).

Regression, GLMs, and Loss Models:

SWLE and MWLE in regression and GLM contexts provide robust parameter estimates in the presence of tail contamination or systematic model misspecification. The score equations admit analytic forms and are computationally on par with standard GLM fitting (Fung, 2022).

6. Extension, Model Diagnostics, and Empirical Illustration

Empirical studies demonstrate:

High efficiency under pure models, with MSE and KL divergence near the MLE (Agostinelli et al., 22 Jul 2025, Agostinelli, 2018).
Superior robustness under contamination levels up to 45%, with substantially reduced bias and improved precision versus classical high-breakdown competitors (Agostinelli et al., 22 Jul 2025).
In real data with latent subgroups, WLE (especially with depth-based weights) can identify meaningful substructure by the emergence of multiple non-degenerate roots, each isolating a latent component (Agostinelli, 2018, Agostinelli et al., 22 Jul 2025).
Diagnostics:
- Weighted likelihood estimators facilitate model checking via systematic sensitivity of parameter estimates to weight hyperparameters, allowing Wald-type tests for misspecification (Fung, 2022).
- In mixtures and heavy-tail models, WLEs yield stable tail-index estimates robust to overfitting of the model body (Fung, 2021).
Software:

Implementations (e.g., R package "wle") exist with routine provisions for bandwidth selection, root selection, and diagnostic tools (Agostinelli et al., 2017).

7. Formal Links to Generalized Means and Connections to Divergence-Based Estimation

The WLE framework encompasses Lehmer and Hölder mean families as explicit solutions in exponential families by suitable choice of weights and sufficient statistics (Ziou, 2023, Ziou et al., 2024). Specifically, the WLE solution in exponential families reduces to a generalized weighted mean, with the arithmetic, geometric, harmonic, and power means as special cases. This establishes a probabilistic foundation for these classical means and elucidates their robustness and efficiency when viewed as WLEs.

The WLE is a minimum-divergence estimator (in the sense of $w_i\in[0,1]$ 9-divergence), providing a direct link to large deviations and conditional likelihood theory, and achieving Bahadur-efficient testing properties (Broniatowski et al., 2012). This connection underlines the role of WLE not only as an analytic device, but also as a robust inferential principle with optimal asymptotic properties under weighted sampling.