Likelihood Ratio Testing: Theory & Extensions

Updated 5 June 2026

Likelihood Ratio Testing is a statistical method for comparing nested models based on the ratio of maximized likelihoods under competing hypotheses.
It employs asymptotic theory, such as Wilks’ theorem, to approximate the test statistic's distribution using chi-squared or modified distributions in irregular settings.
Modern adaptations include dimension corrections, high-dimensional adjustments, and robust techniques for inference with missing data and latent variables.

The likelihood ratio test (LRT) is a central statistical tool for hypothesis testing in parametric frameworks. It formalizes the comparison between nested models via the maximized likelihood under the null and alternative, and underpins the theory and practice of model selection, goodness-of-fit, high-dimensional inference, latent variable modeling, irregular settings, missing-data analysis, and the construction of universally valid tests. This article documents the theoretical foundations of the LRT, its asymptotic and finite-sample properties, its generalizations, its modern high-dimensional variants, and key practical and conceptual issues.

1. Definition and Classical Asymptotics

Let $X_1,\ldots,X_n$ be i.i.d. observations from a model $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ , with log-likelihood $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ . For testing $H_0: \theta \in \Theta_0$ versus $H_1: \theta \in \Theta\setminus\Theta_0$ where $\Theta_0\subset\Theta$ , the LRT statistic is:

$\Lambda_n = -2 \log\frac{\sup_{\theta\in\Theta_0} L_n(\theta)}{\sup_{\theta\in\Theta} L_n(\theta)} = 2 \bigl( \ell_n(\hat\theta_1) - \ell_n(\hat\theta_0) \bigr)$

where $\hat\theta_1$ and $\hat\theta_0$ are the unrestricted and restricted maximum likelihood estimators, respectively. Under standard regularity conditions—including interiority of the null, differentiability in quadratic mean, invertible Fisher information, Lipschitz continuity, and consistency of MLEs—Wilks’ theorem holds:

$\Lambda_n \xrightarrow{d} \chi^2_{d}, \quad d=\dim(\Theta)-\dim(\Theta_0)$

as $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 0 (Chen et al., 2020). This is pivotal: critical values can be drawn from the $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 1 distribution, enabling asymptotic level control.

2. Extensions: Dimension-Restricted LRTs and Power

Dimension-restricted submodels arise when restricting alternatives to a strict submanifold $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 2, of dimension $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 3. The corresponding restricted LRT statistic is:

$\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 4

Under regularity, asymptotic null distribution becomes $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 5. As per the “dimension-restricted LRT conjecture,” any restriction lowering the alternative's dimension improves (increases) Pitman asymptotic power against local alternatives. Explicitly, under $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 6, both unrestricted and restricted statistics converge to noncentral chi-squared distributions with the same noncentrality parameter $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 7, but the power strictly increases as $\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 8 decreases [(Trosset et al., 2016), Theorem 1]:

$\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}$ 9

Nevertheless, this guarantee is only asymptotic: in finite samples, counterexamples (e.g., multinomial models with Hardy–Weinberg constraints) demonstrate that a dimension restriction can, for certain alternatives, reduce power (Trosset et al., 2016). Thus, for small $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 0 or discrete sample spaces, exact power analysis is essential.

3. LRT Beyond Regularity: Boundaries, Singularities, and Latent Variables

Wilks’ theorem may not hold if regularity conditions fail—e.g., the null is on a boundary, information is singular, or nuisance parameters are unidentifiable under $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 1. Latent variable models (factor analysis, random effects) exemplify such irregularities (Chen et al., 2020). In these cases, the limiting distribution of $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 2 is typically not $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 3 but a more complicated functional of Gaussian processes and tangent cones.

Chernoff–van der Vaart–Drton theory replaces the standard limit with:

$\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 4

where $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 5 is the tangent cone to the null parameter space at $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 6 and $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 7. If the tangent cone is a subspace, the limit is $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 8; if a convex cone, a mixture (“chi-bar squared,” $\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)$ 9) occurs. For boundary points with unidentifiable nuisance and singular information, LRT statistics (e.g., genetic linkage, mixture models) converge to suprema of $H_0: \theta \in \Theta_0$ 0-processes (Ekvall et al., 8 May 2026). Proper inference requires characterizing tangent cones and, when necessary, computing quantiles numerically or using parametric bootstraps (Chen et al., 2020).

Special Tables: Asymptotic LRT Law Types

Scenario	Limit of $H_0: \theta \in \Theta_0$ 1	Reference Section
Regular, interior null	$H_0: \theta \in \Theta_0$ 2	Wilks' theorem
Boundary/irregular	Mixture ( $H_0: \theta \in \Theta_0$ 3) via tangent cone	Chernoff–Drton theory

4. High-dimensional Settings and Corrected LRTs

In high-dimensional regimes ( $H_0: \theta \in \Theta_0$ 4, $H_0: \theta \in \Theta_0$ 5, or group numbers comparable to $H_0: \theta \in \Theta_0$ 6), classical $H_0: \theta \in \Theta_0$ 7 approximations break down: LRT statistics diverge or have non-pivotal, non-Gaussian laws (Jiang et al., 2013, Bai et al., 2012, He et al., 2018, Wang et al., 2013, Choi et al., 2015). The solution is to recenter and rescale the test statistic—using random matrix theory and CLTs for spectral statistics—so that the limit becomes normal, not chi-squared.

For identity testing in covariance matrices:

$H_0: \theta \in \Theta_0$ 8

has

$H_0: \theta \in \Theta_0$ 9

where $H_1: \theta \in \Theta\setminus\Theta_0$ 0, $H_1: \theta \in \Theta\setminus\Theta_0$ 1 are explicit in $H_1: \theta \in \Theta\setminus\Theta_0$ 2, $H_1: \theta \in \Theta\setminus\Theta_0$ 3 (Wang et al., 2013, Choi et al., 2015). For linear regression or MANOVA, analogous corrections apply (Bai et al., 2012, He et al., 2018, Dette et al., 2019), and testing simultaneous means and covariances similarly requires centering and scaling, with explicit formulas from random matrix theory (Niu et al., 2024).

Regularization, e.g., via shrinkage estimators, further improves power and variance control in high-dimensional covariance inference (Choi et al., 2015).

5. Modern LRT Generalizations: Universality and Irregularity

Universal inference develops LRT-based tests and confidence sets with exact, finite-sample guarantees, valid without any regularity conditions (Dunn et al., 2021, Delong et al., 27 Oct 2025). The “split LRT” class splits the data, fits parameters on one subset, and evaluates likelihoods on the other: the resulting statistic is an e-variable, giving finite-sample valid tests. Aggregating over many splits (by averaging e-values) yields “universal” LRT confidence sets and tests. In standard situations (e.g., Gaussian mean), the universal LRT is typically slightly conservative, with confidence sets having at most 50% larger squared radius than classical LRT-based sets, and slightly reduced power (Dunn et al., 2021). For non-convex or set-identified nulls, universal LRTs can outperform standard LRT-based approaches (e.g., the "doughnut" annulus example).

6. LRTs with Incomplete Data and Imputation

With missing data and multiple imputation (MI), the construction of LRTs is nontrivial. Classic MI-based LRT combinators (Rubin, Meng–Rubin) can be non-invariant, non-monotonic, negative, and inconsistently estimate the fraction of missing information. The “stacked” MI LRT—analyzing the concatenation of all imputed data sets—solves all these issues: it is always nonnegative, reparametrization-invariant, and the associated missing-information estimator is consistent under both null and alternative (Chan et al., 2017). The null distribution takes an $H_1: \theta \in \Theta\setminus\Theta_0$ 4-approximation with denominator degrees of freedom determined by the finite-sample fraction of missing information, yielding a principled and implementable approach for likelihood-based inference with missing data.

7. Practical Considerations and Choosing LRT Variants

Classical $H_1: \theta \in \Theta\setminus\Theta_0$ 5-based LRTs are appropriate only under standard regularity and when $H_1: \theta \in \Theta\setminus\Theta_0$ 6.
Corrected and regularized LRTs are essential when dimension is not negligible relative to sample size; corrections must use high-dimensional central limit theorems and, where beneficial, shrinkage (Bai et al., 2012, Choi et al., 2015, He et al., 2018).
Universal/subsampled LRTs provide robust, finite-sample valid inference even in highly irregular or misspecified models (Dunn et al., 2021, Delong et al., 27 Oct 2025).
Restrictions on alternatives can improve asymptotic power but may have counterintuitive finite-sample effects; explicit power calculations (not LRT heuristics) are essential in small samples or discrete models (Trosset et al., 2016).
Boundary and latent variable problems require the use of $H_1: \theta \in \Theta\setminus\Theta_0$ 7-mixture reference distributions derived from tangent cone geometry (Chen et al., 2020, Ekvall et al., 8 May 2026).
Imputation-based LRTs should use the stacking approach for principled missing data inference (Chan et al., 2017).

Careful attention to underlying assumptions, dimensionality, and regularity is mandatory for valid LRT-based inference in modern, complex, or high-dimensional data settings.