Maximum Empirical Likelihood Estimators

Updated 20 September 2025

Maximum empirical likelihood estimators are defined by maximizing an empirical likelihood function subject to moment constraints for robust, nonparametric inference.
They use Lagrange multipliers in a convex optimization framework to achieve asymptotic normality and chi-squared calibrated confidence regions.
MELEs are widely applied in time series, heavy-tailed models, and high-dimensional problems, offering robustness against model misspecification.

Maximum empirical likelihood estimators (MELEs) are a class of estimators defined by maximizing empirical likelihood (EL) functions, typically under moment or estimating equation constraints. As a nonparametric (or semiparametric) likelihood-based method, empirical likelihood allows for likelihood-ratio inference without requiring specification of a full parametric model, and MELEs inherit optimality, robustness, and inferential properties analogous to parametric maximum likelihood estimators in a wide range of settings.

1. Definition and Fundamental Principles

The maximum empirical likelihood estimator is defined as the maximizer of the empirical likelihood functional, which, for observations $X_1,\ldots,X_n$ and vector of parameters $\theta \in \Theta \subseteq \mathbb{R}^p$ , takes the form

$\hat\theta_n = \arg\max_{\theta \in \Theta} L_n(\theta)$

where $L_n(\theta)$ is the empirical likelihood, typically given by

$L_n(\theta) = \sup\left\{\prod_{i=1}^n p_i:\ p_i \geq 0, \sum_{i=1}^n p_i = 1, \sum_{i=1}^n p_i\,g(X_i,\theta) = 0\right\}$

with $g(\cdot,\theta)$ being a (vector) estimating function defining the constraints.

In modern contexts, such as those treated in "A frequency domain empirical likelihood for short- and long-range dependence" (0708.0197), "Empirical Likelihood based Confidence Regions for first order parameters of a heavy tailed distribution" (Worms et al., 2010), and "High-dimensional empirical likelihood inference" (Chang et al., 2018), the MELE may be adapted for dependent data, structural or high-dimensional constraints, and various estimands.

The empirical log-likelihood ratio, $l_n(\theta) = -2\log[L_n(\theta)/L_n(\hat\theta_n)]$ , can be used directly to construct confidence sets, with the MELE acting as the center of the region.

2. Construction and Optimization

The core computational procedure involves maximizing a function of the (unknown) empirical probabilities $(p_1, ..., p_n)$ subject to linear or nonlinear constraints—equivalent to solving a finite-dimensional convex optimization problem with equality (and often inequality) constraints. The standard solution utilizes the method of Lagrange multipliers, yielding the form

$p_i^* = \frac{1}{n} \frac{1}{1+\lambda(\theta)^\top g(X_i, \theta)}$

where $\lambda(\theta)$ is determined implicitly by the constraint equation

$\frac{1}{n} \sum_{i=1}^n \frac{g(X_i, \theta)}{1+\lambda(\theta)^\top g(X_i, \theta)} = 0$

The MELE $\hat\theta_n$ is then the solution to

$\hat\theta_n = \arg\max_{\theta \in \Theta} \left[\sum_{i=1}^n \log\left(1 + \lambda(\theta)^\top g(X_i, \theta)\right)\right]$

or, in settings like frequency domain empirical likelihood (0708.0197), by maximizing a structured version (e.g., over periodogram ordinates).

In semiparametric models, plug-in nonparametric smoothers may be used within $g$ , and adaptive procedures are often required to handle high-dimensional $\theta$ or complex data structures.

3. Asymptotic Properties and Inference

The MELE has strong consistency and asymptotic normality under standard regularity conditions. Specifically, under i.i.d. data and appropriate smoothness conditions,

$\sqrt{n}(\hat\theta_n - \theta_0) \to N(0, V)$

where $V$ inherits the efficient information structure from the sandwich estimator built on $g$ .

A central finding is that the empirical log-likelihood ratio statistic, $l_n(\theta_0)$ , often converges to a chi-squared distribution with degrees of freedom matching the dimension of the constraint, even under complex data dependence (short- or long-range) as in the frequency domain (0708.0197): $l_n(\theta_0) \overset{d}{\longrightarrow} \chi^2_r$ where $r$ is the dimension of $g$ or the number of constraints.

Even with estimated constraint functions or in high-dimensional regimes, under careful constraint and rate conditions (see (Chang et al., 2018)), empirical likelihood ratio statistics retain tractable limiting distributions, supporting valid inference.

4. Robustness, Misspecification, and Extensions

MELEs exhibit notable robustness properties compared to parametric maximum likelihood estimators, especially under heavy-tailed, contaminated, or dependent data.

Robustification: By incorporating robust M-estimating equations into $g$ (e.g., with bounded influence functions), the MELE can be made resistant to outliers and heavy tails (Özdemir et al., 2018).
Misspecified Constraints: Under misspecification, optimal EL weights may degenerate, assigning disproportionately high mass to a few observations. This leads to breakdowns in Wilks' theorem and non-chi-squared limiting distributions for the likelihood ratio (Ghosh et al., 2019).
Remedies and Global Consistency: To guarantee consistent MELEs, additional or alternative unbiased estimating equations can be introduced, ensuring that the global maximum corresponds to the desired parameter (see (Liang et al., 2023)). Non-regular models (such as the Cauchy location model) often require this approach.

5. Applications Across Data Regimes

MELEs are widely utilized across statistical domains:

Spectral Analysis and Time Series: Frequency domain EL adapts MELEs for short- and long-range dependent data by leveraging the (near-)independence of periodogram ordinates after Fourier transformation (0708.0197). Applications include Whittle estimation and spectral goodness-of-fit tests.
Heavy-Tailed Distributions: For extreme value or threshold exceedance models, MELEs constructed from moment-type estimating equations enable accurate confidence region estimation for tail parameters, often outperforming Wald-type (asymptotic normal approximation) intervals (Worms et al., 2010).
High-dimensional Models: MELEs have been successfully applied for inference on low-dimensional projections in high-dimensional parameter spaces by constructing projected or transformed moment conditions, enabling valid chi-squared calibration for subset inference even when $p \gg n$ (Chang et al., 2018).
Semiparametric Efficiency: In settings with infinite or rapidly growing numbers of constraints (e.g., known or estimated side information on marginals or symmetry), MELEs can achieve semiparametric efficiency, outperforming standard empirical medians or means (Wang et al., 2023).
Variable Selection: MELEs can be combined with adaptive penalization methods (e.g., adaptive LASSO) to automatically select nonzero coefficients while preserving statistical properties such as asymptotic normality and sparsity (oracle property) (Ciuperca, 2023).

6. Implementation and Computation

The numerical computation of MELEs generally involves iterative algorithms to solve for both the empirical weights and the Lagrange multipliers (or dual variables), often via Newton-type or gradient-based optimization. For penalized MELEs with variable selection, coordinate descent or blockwise optimization is integrated within the EL maximization routine.

Empirical likelihood implementations in practice:

Finite-sample constraints: Efficiently handled via standard convex optimization (Newton-Raphson, quasi-Newton, or sequential quadratic programming).
High-dimensional settings: Sparsity-promoting strategies, such as $l_1$ penalization in the transformation matrix (as for the projection in (Chang et al., 2018)), enable solution of situations with more constraints than observations.
Smoothing and missing data: Kernel smoothing is applied for smoothing non-differentiable constraints (e.g., in expectile regression), and missingness is handled through reweighting or estimating the ALASSO weights in the penalized likelihood (Ciuperca, 2023).

7. Comparative Properties and Advantages

MELEs offer several practical and theoretical advantages:

Data-adaptive confidence regions: EL-based regions naturally accommodate model constraints and parameter boundaries (e.g., positivity constraints on scale parameters (Worms et al., 2010)).
Bartlett correction and higher-order accuracy: Extended or corrected MELEs achieve second-order coverage accuracy superior to standard uncorrected likelihood approaches (Tsao et al., 2013).
No need for explicit variance estimation: Unlike Wald-type inference based on asymptotic normality, MELE-based regions are “self-calibrating” via chi-squared limits, obviating explicit estimation or inversion of the information matrix.
Unified treatment of dependence: Frequency domain transformations extend MELEs to dependent data without substantial changes to inferential validity (0708.0197).
Robustness to model misspecification: MELEs are less reliant on correct distributional assumptions, and remedies exist for handling mis-specified or ill-behaved estimating equations (Ghosh et al., 2019, Liang et al., 2023).

Table: Major Methodological Features Across MELE Literature

Setting / Application	Constraints / Transformation	Key Asymptotic Result
Time series/spectral (0708.0197)	Fourier transform, spectral moments	$\chi^2$ limit for EL ratio
Heavy tails/extremes (Worms et al., 2010)	Moment-type, non-score constraints	Improved coverage vs. Wald
High-dimension (Chang et al., 2018)	Projection, sparsity-promoting $A_n$	Valid inference for $p \gg n$
Robust regression (Özdemir et al., 2018)	Robust M-estimating functions	Less sensitive to outliers
Global consistency (Liang et al., 2023)	Augmented set of estimating eqns	Consistent global MELE

MELEs represent a versatile and theoretically grounded approach to inference and estimation across a broad spectrum of modern statistical models, offering flexibility in constraint specification, computational feasibility, and robust, data-driven confidence procedures. Their continued methodological development and extension to increasingly complex data structures ensure an enduring role in non- and semiparametric inference.