Papers
Topics
Authors
Recent
Search
2000 character limit reached

Likelihood Ratio Testing: Theory & Extensions

Updated 5 June 2026
  • Likelihood Ratio Testing is a statistical method for comparing nested models based on the ratio of maximized likelihoods under competing hypotheses.
  • It employs asymptotic theory, such as Wilks’ theorem, to approximate the test statistic's distribution using chi-squared or modified distributions in irregular settings.
  • Modern adaptations include dimension corrections, high-dimensional adjustments, and robust techniques for inference with missing data and latent variables.

The likelihood ratio test (LRT) is a central statistical tool for hypothesis testing in parametric frameworks. It formalizes the comparison between nested models via the maximized likelihood under the null and alternative, and underpins the theory and practice of model selection, goodness-of-fit, high-dimensional inference, latent variable modeling, irregular settings, missing-data analysis, and the construction of universally valid tests. This article documents the theoretical foundations of the LRT, its asymptotic and finite-sample properties, its generalizations, its modern high-dimensional variants, and key practical and conceptual issues.

1. Definition and Classical Asymptotics

Let X1,,XnX_1,\ldots,X_n be i.i.d. observations from a model {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}, with log-likelihood n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i). For testing H0:θΘ0H_0: \theta \in \Theta_0 versus H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_0 where Θ0Θ\Theta_0\subset\Theta, the LRT statistic is:

Λn=2logsupθΘ0Ln(θ)supθΘLn(θ)=2(n(θ^1)n(θ^0))\Lambda_n = -2 \log\frac{\sup_{\theta\in\Theta_0} L_n(\theta)}{\sup_{\theta\in\Theta} L_n(\theta)} = 2 \bigl( \ell_n(\hat\theta_1) - \ell_n(\hat\theta_0) \bigr)

where θ^1\hat\theta_1 and θ^0\hat\theta_0 are the unrestricted and restricted maximum likelihood estimators, respectively. Under standard regularity conditions—including interiority of the null, differentiability in quadratic mean, invertible Fisher information, Lipschitz continuity, and consistency of MLEs—Wilks’ theorem holds:

Λndχd2,d=dim(Θ)dim(Θ0)\Lambda_n \xrightarrow{d} \chi^2_{d}, \quad d=\dim(\Theta)-\dim(\Theta_0)

as {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}0 (Chen et al., 2020). This is pivotal: critical values can be drawn from the {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}1 distribution, enabling asymptotic level control.

2. Extensions: Dimension-Restricted LRTs and Power

Dimension-restricted submodels arise when restricting alternatives to a strict submanifold {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}2, of dimension {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}3. The corresponding restricted LRT statistic is:

{Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}4

Under regularity, asymptotic null distribution becomes {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}5. As per the “dimension-restricted LRT conjecture,” any restriction lowering the alternative's dimension improves (increases) Pitman asymptotic power against local alternatives. Explicitly, under {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}6, both unrestricted and restricted statistics converge to noncentral chi-squared distributions with the same noncentrality parameter {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}7, but the power strictly increases as {Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}8 decreases [(Trosset et al., 2016), Theorem 1]:

{Pθ:θΘRk}\{P_\theta : \theta \in \Theta \subset \mathbb{R}^k\}9

Nevertheless, this guarantee is only asymptotic: in finite samples, counterexamples (e.g., multinomial models with Hardy–Weinberg constraints) demonstrate that a dimension restriction can, for certain alternatives, reduce power (Trosset et al., 2016). Thus, for small n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)0 or discrete sample spaces, exact power analysis is essential.

3. LRT Beyond Regularity: Boundaries, Singularities, and Latent Variables

Wilks’ theorem may not hold if regularity conditions fail—e.g., the null is on a boundary, information is singular, or nuisance parameters are unidentifiable under n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)1. Latent variable models (factor analysis, random effects) exemplify such irregularities (Chen et al., 2020). In these cases, the limiting distribution of n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)2 is typically not n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)3 but a more complicated functional of Gaussian processes and tangent cones.

Chernoff–van der Vaart–Drton theory replaces the standard limit with:

n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)4

where n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)5 is the tangent cone to the null parameter space at n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)6 and n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)7. If the tangent cone is a subspace, the limit is n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)8; if a convex cone, a mixture (“chi-bar squared,” n(θ)=i=1nlogpθ(Xi)\ell_n(\theta) = \sum_{i=1}^n \log p_\theta(X_i)9) occurs. For boundary points with unidentifiable nuisance and singular information, LRT statistics (e.g., genetic linkage, mixture models) converge to suprema of H0:θΘ0H_0: \theta \in \Theta_00-processes (Ekvall et al., 8 May 2026). Proper inference requires characterizing tangent cones and, when necessary, computing quantiles numerically or using parametric bootstraps (Chen et al., 2020).

Special Tables: Asymptotic LRT Law Types

Scenario Limit of H0:θΘ0H_0: \theta \in \Theta_01 Reference Section
Regular, interior null H0:θΘ0H_0: \theta \in \Theta_02 Wilks' theorem
Boundary/irregular Mixture (H0:θΘ0H_0: \theta \in \Theta_03) via tangent cone Chernoff–Drton theory

4. High-dimensional Settings and Corrected LRTs

In high-dimensional regimes (H0:θΘ0H_0: \theta \in \Theta_04, H0:θΘ0H_0: \theta \in \Theta_05, or group numbers comparable to H0:θΘ0H_0: \theta \in \Theta_06), classical H0:θΘ0H_0: \theta \in \Theta_07 approximations break down: LRT statistics diverge or have non-pivotal, non-Gaussian laws (Jiang et al., 2013, Bai et al., 2012, He et al., 2018, Wang et al., 2013, Choi et al., 2015). The solution is to recenter and rescale the test statistic—using random matrix theory and CLTs for spectral statistics—so that the limit becomes normal, not chi-squared.

For identity testing in covariance matrices:

H0:θΘ0H_0: \theta \in \Theta_08

has

H0:θΘ0H_0: \theta \in \Theta_09

where H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_00, H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_01 are explicit in H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_02, H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_03 (Wang et al., 2013, Choi et al., 2015). For linear regression or MANOVA, analogous corrections apply (Bai et al., 2012, He et al., 2018, Dette et al., 2019), and testing simultaneous means and covariances similarly requires centering and scaling, with explicit formulas from random matrix theory (Niu et al., 2024).

Regularization, e.g., via shrinkage estimators, further improves power and variance control in high-dimensional covariance inference (Choi et al., 2015).

5. Modern LRT Generalizations: Universality and Irregularity

Universal inference develops LRT-based tests and confidence sets with exact, finite-sample guarantees, valid without any regularity conditions (Dunn et al., 2021, Delong et al., 27 Oct 2025). The “split LRT” class splits the data, fits parameters on one subset, and evaluates likelihoods on the other: the resulting statistic is an e-variable, giving finite-sample valid tests. Aggregating over many splits (by averaging e-values) yields “universal” LRT confidence sets and tests. In standard situations (e.g., Gaussian mean), the universal LRT is typically slightly conservative, with confidence sets having at most 50% larger squared radius than classical LRT-based sets, and slightly reduced power (Dunn et al., 2021). For non-convex or set-identified nulls, universal LRTs can outperform standard LRT-based approaches (e.g., the "doughnut" annulus example).

6. LRTs with Incomplete Data and Imputation

With missing data and multiple imputation (MI), the construction of LRTs is nontrivial. Classic MI-based LRT combinators (Rubin, Meng–Rubin) can be non-invariant, non-monotonic, negative, and inconsistently estimate the fraction of missing information. The “stacked” MI LRT—analyzing the concatenation of all imputed data sets—solves all these issues: it is always nonnegative, reparametrization-invariant, and the associated missing-information estimator is consistent under both null and alternative (Chan et al., 2017). The null distribution takes an H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_04-approximation with denominator degrees of freedom determined by the finite-sample fraction of missing information, yielding a principled and implementable approach for likelihood-based inference with missing data.

7. Practical Considerations and Choosing LRT Variants

  • Classical H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_05-based LRTs are appropriate only under standard regularity and when H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_06.
  • Corrected and regularized LRTs are essential when dimension is not negligible relative to sample size; corrections must use high-dimensional central limit theorems and, where beneficial, shrinkage (Bai et al., 2012, Choi et al., 2015, He et al., 2018).
  • Universal/subsampled LRTs provide robust, finite-sample valid inference even in highly irregular or misspecified models (Dunn et al., 2021, Delong et al., 27 Oct 2025).
  • Restrictions on alternatives can improve asymptotic power but may have counterintuitive finite-sample effects; explicit power calculations (not LRT heuristics) are essential in small samples or discrete models (Trosset et al., 2016).
  • Boundary and latent variable problems require the use of H1:θΘΘ0H_1: \theta \in \Theta\setminus\Theta_07-mixture reference distributions derived from tangent cone geometry (Chen et al., 2020, Ekvall et al., 8 May 2026).
  • Imputation-based LRTs should use the stacking approach for principled missing data inference (Chan et al., 2017).

Careful attention to underlying assumptions, dimensionality, and regularity is mandatory for valid LRT-based inference in modern, complex, or high-dimensional data settings.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Likelihood Ratio Testing (LRT).