Composite Likelihood Framework

Updated 4 February 2026

Composite likelihood is a pseudo-likelihood that combines lower-dimensional components to approximate otherwise intractable full likelihoods.
The framework balances computational tractability with statistical efficiency, leveraging tools like the Godambe (sandwich) information for accurate variance estimation.
It underpins robust estimation, model selection, and scalable computation in complex data settings, despite potential efficiency trade-offs.

A composite likelihood is a pseudo-likelihood function constructed by combining lower-dimensional likelihood objects—such as marginal or conditional densities—when the full joint likelihood is analytically or computationally intractable. This framework offers a principled yet flexible strategy for statistical inference across complex or high-dimensional models, and is distinguished by its balance between computational tractability and statistical efficiency. Composite likelihood methods are supported by a robust asymptotic theory—centered on the Godambe (sandwich) information—which underpins their frequentist and Bayesian properties and their use as a foundation for efficient estimation, model selection, robust inference, and scalable computation.

1. Formal Definition and Structure

Let $y_1,\dots,y_n$ denote independent data vectors from an unknown or high-dimensional model with parameter $\theta\in\Theta\subset\mathbb{R}^p$ . Suppose that for each observation, a collection of tractable likelihood components $f_{A_k}(y_{A_k};\theta)$ is available for subsets $A_k$ of the data (e.g., univariate margins, bivariate pairs, or blocks). The composite likelihood is defined as

$L_{\mathrm{CL}}(\theta) = \prod_{i=1}^n \prod_{k=1}^K f_{A_k}(y_{i,A_k};\theta)^{w_k},$

with nonnegative weights $w_k$ . The corresponding composite log-likelihood is

$\ell_{\mathrm{CL}}(\theta) = \sum_{i=1}^n \sum_{k=1}^K w_k \log f_{A_k}(y_{i,A_k};\theta).$

The maximum composite likelihood estimator (MCLE) $\hat\theta_{\mathrm{CL}}$ solves the score equation

$U_{\mathrm{CL}}(\theta) = \nabla_\theta \ell_{\mathrm{CL}}(\theta) = \sum_{i=1}^n \sum_{k=1}^K w_k\, \nabla_\theta \log f_{A_k}(y_{i,A_k};\theta) = 0.$

This construction encompasses a wide variety of scenarios, including blockwise, pairwise, row-column, and marginal composite likelihoods (Miniato et al., 2021, Ting et al., 2020, Erhardt et al., 2014, Bellio et al., 2023, Stoehr et al., 2024).

2. Asymptotic Theory: Sandwich Variance and Godambe Information

For properly specified component models and suitable identifiability conditions, the MCLE is consistent and asymptotically normal with covariance determined by the Godambe (sandwich) information matrix: $G(\theta) = H(\theta) J(\theta)^{-1} H(\theta),$ where $H(\theta) = E[ -\nabla^2 \ell_{\mathrm{CL}}(\theta) ]$ is the sensitivity matrix and $J(\theta) = \mathrm{Var}[\nabla \ell_{\mathrm{CL}}(\theta)]$ is the variability. Specifically,

$\sqrt{n} ( \hat\theta_{\mathrm{CL}} - \theta_0 ) \overset{d}{\rightarrow} N\big(0,\, G(\theta_0)^{-1}\big).$

The plug-in estimator for the asymptotic covariance is $H(\hat\theta)^{-1} J(\hat\theta) H(\hat\theta)^{-1}$ (Miniato et al., 2021, Ting et al., 2020, Erhardt et al., 2014, Stoehr et al., 2024). For hypothesis testing, the standard likelihood-ratio, Wald, and score test statistics built on $G(\theta)$ require adjustment due to the "non-genuine" nature of the composite likelihood (Lunardon et al., 2013, Martin et al., 2016).

3. Statistical Efficiency and Robustness

Composite likelihood methods may suffer a loss in efficiency relative to maximum likelihood, as the Godambe information $G(\theta)$ is in general smaller than the true Fisher information. The trade-off is a significant reduction in computational complexity, especially for high-dimensional data or complex dependencies. While composite likelihoods are sometimes thought to confer robustness to model misspecification—since they make fewer global assumptions—this property is not universal. If the chosen marginal/conditional models are themselves misspecified, the MCLE may lose consistency even in situations where the full MLE remains robust (Ogden, 2014, Nikoloulopoulos, 2016).

4. Construction, Sparsity, and Algorithmic Choices

Composite likelihood inference depends crucially on the selection of its constituent likelihood objects and the weighting scheme. Important approaches include:

Pairwise/Blockwise Construction: Using all bivariate (or $k$ -wise) marginal densities for efficiency and feasibility in high dimensions (Ting et al., 2020, Stoehr et al., 2024, Whitaker et al., 2019, Bennedsen et al., 2024).
Sparse Selection and Optimization: Employing ℓ₁-penalized optimization for weight selection, as in (Huang et al., 2021, Huang et al., 2017), which yields sparse composite scores with a reduced number of active components and near-optimal efficiency.
Gibbs Sampling for Composition: Utilizing MCMC schemes to sample composite likelihood compositions that minimize estimator variance or other optimality criteria (Ferrari et al., 2015).
Adaptive Weighting: Dynamically tuning weights based on score information or computational constraints to optimize the bias-variance-efficiency trade-off (Huang et al., 2021, Huang et al., 2017, Zhao et al., 2018).

5. Applications, Extensions, and Computational Advantages

Composite likelihood is widely applied where joint likelihoods are analytically unavailable or computationally prohibitive:

Multivariate Probit and Latent Variable Models: Pairwise and blockwise composite likelihoods circumvent intractable integrals (Ting et al., 2020, Stoehr et al., 2024, Bellio et al., 2023).
Spatial and Spatio-temporal Models: Margins or local structures model complex dependence efficiently; e.g., spatial C-vine copula composites (Erhardt et al., 2014).
Meta-analysis, Mixed Models, and Diagnostic Test Studies: Composite likelihood enables scalable inference for clustered and correlated data, with specific adaptations for generalized linear mixed models (Miniato et al., 2021, Nikoloulopoulos, 2016).
Symbolic and Histogram-valued Data: Marginal histograms form composites that enable inference at unprecedented dimension, e.g., $K>100$ (Whitaker et al., 2019).
Ranked and Choice Data: Composite marginal likelihoods underlie frameworks for learning random utility models from partial rankings (Zhao et al., 2018).

Composite likelihood estimators retain consistency under very general settings and deliver substantial computational reduction, enabling inference at scales where full MLE is infeasible. For instance, in crossed random effects models, composite-likelihood-based row-column factorizations reduce the computation from superlinear in sample size to $O(N)$ (Bellio et al., 2023).

6. Calibration, Bayesian Adjustment, and Testing

Because composite likelihood functions lack the distributional properties of the genuine likelihood, naive application of standard inference procedures can yield mis-calibrated results. Calibration strategies include:

Magnitude (Temperature) Adjustment: Raising the composite likelihood to a power $\kappa$ to match marginal posterior variance (Miniato et al., 2021).
Curvature Adjustment: Linear transformation of the parameter space to align naive and robust covariances (Miniato et al., 2021).
Saddlepoint and φ-Divergence Tests: Nonparametric saddlepoint approaches and generalized divergence measures address the finite-sample limitations of standard test statistics and avoid dependence on Godambe information (Lunardon et al., 2013, Martin et al., 2016).
Stochastic Approximation: Composite likelihoods can be optimized via stochastic gradient descent with variance appropriately decomposed into data and optimization contributions, enabling scalable inference in very large datasets (Alfonzetti et al., 2023).

7. Practical Considerations and Limitations

Composite likelihood offers a scalable approximation to full likelihood inference but introduces methodological considerations:

Efficiency Loss: Loss of information due to exclusion of high-order dependencies not contained in the chosen margins.
Robustness Misconceptions: Consistency and robustness hold only for components that are correctly specified; erroneous margins can lead to inconsistency (Ogden, 2014).
Covariance Estimation: Accurate estimation of the sandwich (Godambe) variance is required for valid frequentist or Bayesian inference. Inaccurate estimation may bias $p$ -values and confidence sets (Lunardon et al., 2013, Martin et al., 2016).
Selection of Sub-likelihoods: The efficiency–cost trade-off rests on judicious choice of sub-likelihoods. Data-driven, sparsity-inducing, or stability-selection strategies are necessary for optimal composite likelihood construction (Huang et al., 2021, Ferrari et al., 2015, Huang et al., 2017).
Interpretability and Dependence: In some settings (e.g., diagnostic meta-analyses), composite approaches that neglect dependence structures do not provide valid joint or predictive summaries, limiting scientific utility (Nikoloulopoulos, 2016).

Composite likelihood has become a foundational tool for modern statistical inference under computational constraints, with a rapidly developing theoretical and methodological literature and diverse domain applications.