Maximum Likelihood Estimators (MLEs)

Updated 19 September 2025

Maximum Likelihood Estimators (MLEs) are methods that determine parameter values by maximizing the likelihood function given observed data.
They exhibit key properties such as consistency, asymptotic normality, variance-optimality, and invariance, underpinning their reliability in various models.
Practical computation of MLEs leverages geometric, algebraic, and numerical strategies to address challenges like non-existence, high-dimensionality, and model constraints.

Maximum Likelihood Estimators (MLEs) are a foundational element of statistical inference, providing parameter estimates by maximizing the likelihood function under a specified model for observed data. In the formal parametric context, for a model $p(y | \theta)$ and data $\mathcal{D}$ , the MLE $\hat{\theta}$ is the maximizer of $L(\theta) = p(\mathcal{D} | \theta)$ . MLEs play a central role in theory, computation, and practice across a wide array of statistical models, including classical, semiparametric, and high-dimensional settings. This article surveys the mathematical framework, fundamental properties, geometric and algorithmic aspects, distributional theory, and computational strategies related to MLEs, with a particular emphasis on advanced developments relevant to modern applications.

1. Mathematical Characterization and Uniqueness

The MLE is defined as the maximizer of the likelihood or log-likelihood function given data and a parametric model. The basic existence and uniqueness conditions are determined by the geometry of the parameter space and the regularity of the mapping from parameter to likelihood. In exponential and log-linear families, existence and uniqueness have precise geometric characterizations.

In log-linear models for contingency tables, the existence of the MLE is equivalent to the observed sufficient statistic $t = A^T n$ (where $A$ is the model matrix and $n$ is the observed table) lying in the relative interior of the marginal cone $C_{(A)} = \text{cone}(A^T)$ ; that is,

$t \in \text{ri}(C_{(A)}).$

If $t$ lies on the boundary, the MLE does not exist in the ordinary sense, necessitating the use of an extended MLE solution defined on lower-dimensional faces of the cone. These geometric characterizations enable direct verification of MLE existence using tools from convex analysis and facilitate the identification of nonidentifiable parameters in high-dimensional or sparse-data contexts (Fienberg et al., 2011).

Similarly, in more general exponential families, the MLE's existence and its potential extension to the closure of the family are controlled by the support and location of the observed sufficient statistic. For distribution estimation within exponential families using Kullback–Leibler (KL) loss, the MLE is the unique uniformly minimum KL variance unbiased estimator, a result established even for extended KL projections when classical MLEs do not exist (Vos et al., 2015).

2. Properties and Optimality Principles

MLEs exhibit several optimal properties in regular problems:

Consistency: Under mild regularity conditions (identifiability, continuity, compactness), MLEs are consistent estimators, converging in probability to the true parameter value as the sample size increases.
Asymptotic Normality: In regular parametric models, the MLE satisfies

$\sqrt{n}(\hat{\theta} - \theta_0) \rightarrow_{\mathcal{D}} \mathcal{N}(0, I(\theta_0)^{-1})$

where $I(\theta_0)$ is the Fisher information matrix.

Variance-Optimality: In exponential families, the MLE minimizes the KL risk (distributional variance) among all distribution-unbiased estimators (i.e., MLE is UMV $^\dagger$ U), and remains optimal even for the KL projection of the true distribution when the model is misspecified and the mean value of sufficient statistics agrees (Vos et al., 2015).
Invariance: The MLE is invariant under one-to-one reparametrization.
Rao–Blackwellization: The distributional version of the Rao–Blackwell theorem holds for MLEs in exponential families, whereby conditioning on a sufficient statistic yields the (distribution) estimator with minimum variance.

For certain models (e.g., log-linear, multinomial, Poisson, and Gaussian graphical models), careful consideration must be given to cases where the MLE fails to exist (e.g., due to sampling zeros, boundary parameters, or degeneracy), in which case the "extended MLE" or "projected MLE" (via KL projection or geometric closure) is relevant (Fienberg et al., 2011, Vos et al., 2015).

3. Geometric and Algebraic Structure

The inference landscape for MLEs is deeply influenced by geometry and algebra, especially in log-linear and graphical settings:

Polyhedral Geometry: In log-linear models, existence and estimability track the convex geometric configuration of sufficient statistics relative to the marginal cone; facial sets correspond to zeroed-out model parameters, and only parameters supported on the active face are estimable (Fienberg et al., 2011).
Algebraic Geometry: For discrete models described as algebraic varieties, the MLE corresponds to solutions of polynomial score equations; critical points of the likelihood function are characterized as the zero locus of an ideal in polynomial rings, and the maximum likelihood degree (ML degree) quantifies the algebraic complexity of the estimation problem (Améndola et al., 2020, Rodriguez, 2014).
Duality Theory: The dual and conormal varieties give rise to the dual likelihood equations, and solving these can bypass explicit computation with the original model's defining equations; solutions on the dual side correspond, via explicit coordinate relations, to solutions of the primal likelihood equations (Rodriguez, 2014).
Non-concavity and Local Geometry: In determinantal point processes (DPPs), the log-likelihood surface is highly nonconcave, with exponentially many saddle points and degeneracies in the Fisher information, yielding challenging optimization landscapes and, in high dimensionality, exponential scaling ("curse of dimensionality") in asymptotic variance (Brunel et al., 2017).

4. Algorithmic and Computational Strategies

Efficient and robust computation of MLEs requires specialized algorithms and careful handling of model-specific challenges:

Polyhedral and Convex-Algebraic Methods: For log-linear models, rapid detection of nonexistence and computation of the extended MLE are enabled by geometric characterization in terms of the marginal cone and facial sets; geometric algorithms outperform iterative scaling or ad hoc adjustments, particularly in sparse or high-dimensional tables (Fienberg et al., 2011).
Algebraic Solvers: Packages such as GraphicalModelsMLE (for Gaussian graphical models) leverage Macaulay2 to generate polynomial ideals for the score equations, compute Gröbner bases, and find all complex critical points, facilitating global optimization over models with multiple local optima or nonunique maxima (Améndola et al., 2020).
Extensions for Intractable Likelihoods: When likelihood functions are unavailable in closed form (e.g., due to latent integrals or high-dimensional random effects), maximum approximated likelihood (MAL) estimators generalize maximum simulated likelihood (MSL) to a broader class of integration methods (quasi–Monte Carlo, Gaussian quadrature, sparse grids). Under explicit conditions (uniform convergence of the approximation and error decay relative to sample size), MAL estimators retain consistency and asymptotic normality, often with sharper computational complexity than standard MSL (Griebel et al., 2019).
Generalized and Closed-Form MLEs: In models where standard MLEs lack closed-form, generalized likelihood equations can be constructed via auxiliary parameters or transformations, yielding explicit estimators with the same fundamental properties (strong consistency, asymptotic normality, invariance) (Ramos et al., 2021).

5. MLEs under Constraints, Censoring, and Model Misspecification

Practical applications often confront nonstandard data conditions or model limitations:

Censoring and Interval Data: In settings with (multi-dimensional) interval-censored data, "observed range" MLEs generalize classical likelihood maximization by incorporating partial information via constraints on the empirical distribution function. This framework supports censored kernel density estimation and Nadaraya–Watson regression with nonparametric adaptation to censoring, and extends to multinomial and contingency table inference with incomplete counts (Markov, 2011).
Boundary and Non-Existence Cases: For parametric models with outcomes landing on the boundary (e.g., contingency tables with zeros, binomials with all zeros or ones), extended MLEs and KL-projected estimators provide principled solutions, ensuring that distributional unbiasedness and minimum-variance properties still hold on the closure of the parameter space (Fienberg et al., 2011, Vos et al., 2015).
Model Misspecification: In exponential families, even if the true underlying distribution does not belong to the model, the MLE is optimal among unbiased estimators for the KL projection of the true distribution, provided canonical means align (Vos et al., 2015).

6. Advanced Distributional Results and Sampling Behavior

Finite-sample and first-order distributional analyses refine understanding and support practical inference:

Sampling Distribution Approximation: In models lacking closed-form MLEs (e.g., the Weibull distribution), the sampling distribution of the MLE can often be well-approximated by other parametric distributions (such as Weibull), with parameters derived via moment-matching on simulated MLEs. These approximations are validated by quantile comparison and seen to agree closely with first-order asymptotic bias and mean squared error for moderate to large sample sizes (Truong et al., 20 Jan 2025).
Nonstandard Asymptotics near Boundaries: In admixture models, the asymptotic distribution of the MLE exhibits nonnormal behavior when true parameters lie on the boundary, resulting in "truncated" or constrained normal distributions described by cone projections, as predicted by Andrews' theory for estimation on the boundary of the parameter space (Heinzel, 25 Jul 2025).
Limit Theorems under Dependence and Misspecification: MLEs in processes with dependent data (e.g., Wishart diffusions, Markov-modulated models) or with high-dimensional random effects are shown to exhibit strong consistency, rate-optimal asymptotic normality, and explicit covariance characterizations under explicit structural and ergodicity conditions (Alfonsi et al., 2015, Eslava et al., 2022).

7. Extensions, Alternatives, and Applications

The methodological reach of MLEs continues to expand:

MLE Characterizations and Identification: Recent results provide systematic theorems determining when specific functionals or statistics are MLEs for a family, via equivalence class, minimal sample size, and invertibility of score functions frameworks. These extend classical results (e.g., Gauss's mean for normality) to modern parametric, scale, location, and transformation models (Duerinckx et al., 2012).
Integration with Algorithmic and Machine Learning Tasks: In high-dimensional or nonparametric models (e.g., learning graphical models, mutual information estimation, population-level parameter recovery), direct MLE or plug-in approaches are often suboptimal. Alternative estimators leveraging approximation theory or moment-matching can achieve minimax optimality, improved bias properties, and lower sample complexity compared to MLEs, especially in high-dimensional regimes (Jiao et al., 2014, Vinayak et al., 2019).
Likelihood-Based Imputation, Prediction, and Random Effects: Extensions of likelihood (h-likelihood, predictive likelihood) enable direct joint maximization for latent variables and missing data, bypassing the expectation (E) step of EM algorithms, and support efficient single maximum likelihood imputation even when classical marginal likelihood-based imputation is computationally intensive or inapplicable (Han et al., 2022).

Table: Key Geometric and Computational Aspects in Advanced MLE

Topic	Main Mathematical Principle	Practical Implication
Log-linear models	Sufficient stat. in ri(cone)—geometric characterization	MLE existence/testing of estimability
Algebraic models	Critical points = zero locus of score ideal	ML degree computation; global optima
Censoring/sampling zeros	Extended MLE, support restriction via facial sets	Likelihood-based inference under zeros
Intractable likelihoods	MAL estimation under uniform error control	Efficient approximation/simulation
High-dim. populations	MLE matches empirical "fingerprint" via KL projection	Optimality for sparse observations
Boundary/constraint cases	Limit distribution as projection onto cones	Uncertainty quantification

MLEs constitute the backbone of a vast spectrum of statistical and machine learning methodology. Their rigorous characterization, extensions for nonstandard regimes, and algorithmic innovations continue to play a critical role in the development and deployment of advanced inferential procedures across theoretical and applied domains.