Covariance-Free Ridge Estimator in Mixed Models

Updated 28 January 2026

The paper introduces a design-consistent pairwise composite likelihood estimator for two-level mixed models that remains unbiased under informative sampling.
It leverages bivariate normal log-densities with closed-form matrix inverses to replace the full likelihood, simplifying complex survey computations.
Simulation studies show that while efficiency for variance components may drop in moderate clusters, the approach ensures robust, unbiased estimates.

Rao’s mixed-effects model, specifically the pairwise composite-likelihood estimation framework developed by Rao and co-workers, provides a general, design-consistent approach for fitting two-level linear mixed models in the context of complex survey data when model clusters and sampling design clusters coincide. This methodology leverages sums of bivariate normal log-densities (rather than the full likelihood), yielding estimators that remain robust under informative sampling—particularly relevant in large-scale health and social science surveys. The approach has been implemented in the R package svylme via the function svy2lme (Lumley et al., 2023).

1. Two-level Linear Mixed Model Specification

Let clusters (e.g., schools) be indexed by $i=1,\dots,n_1$ , with within-cluster units $j=1,\dots,m_i$ . The classical two-level linear mixed model is: $Y_{ij} = X_{ij}\,\beta + Z_{ij}\,b_i + \varepsilon_{ij}$ where $b_i \sim N(0,G)$ , $\varepsilon_{ij} \sim N(0,\sigma^2)$ , and $\text{Cov}(b_i,\varepsilon_{ij}) = 0$ . Here, $X_{ij}$ is a $1 \times p$ vector of fixed-effect covariates, $\beta \in \mathbb{R}^p$ the fixed-effects vector; $Z_{ij}$ is a $1 \times q$ vector of random-effect covariates, and $b_i \in \mathbb{R}^q$ the random-effects vector for cluster $i$ . The random-effects covariance is generally written $G = \sigma^2 V(\theta)$ , with $\theta$ collecting variance and correlation parameters, so that the full parameter vector is $\theta^* = (\beta,\sigma^2,\theta)$ .

The marginal distribution for the cluster response vector $Y_i = (Y_{i1},...,Y_{i,m_i})^T$ is: $Y_i \sim N(X_i \beta,\, \sigma^2\, \Xi_i(\theta))$ where $X_i$ is the $m_i \times p$ covariate matrix and $\Xi_i(\theta) = I_{m_i} + Z_i V(\theta) Z_i^T$ .

2. Pairwise Composite Loglikelihood

Rao’s estimator replaces the full-data loglikelihood with a sum over all within-cluster bivariate log-densities. For responses $(Y_{ij}, Y_{ik})^T$ in cluster $i$ (with $j < k$ ), the joint covariance is $\sigma^2\,\Xi_{i,jk}(\theta)$ , where $\Xi_{i,jk}(\theta)$ denotes the $2 \times 2$ submatrix of $\Xi_i(\theta)$ associated with units $j$ and $k$ . The bivariate normal log-density is

$f_{i,jk}(y_{ij},y_{ik};\beta,\theta,\sigma^2) = (2\pi \sigma^2)^{-1}|\Xi_{i,jk}(\theta)|^{-1/2} \exp\left\{ -\frac{1}{2\sigma^2} r_{i,jk}^T \Xi_{i,jk}(\theta)^{-1} r_{i,jk} \right\}$

where $r_{i,jk} = (y_{ij}-\mu_{ij},\, y_{ik}-\mu_{ik})^T$ , with $\mu_{ij} = X_{ij}\beta$ .

The (infinite-population) composite loglikelihood is

$\ell_P(\beta,\theta,\sigma^2) = \sum_{i=1}^{N_1} \sum_{1 \leq j < k \leq m_i} \ell_{i,jk}(\beta,\theta,\sigma^2)$

with $\ell_{i,jk} = \log f_{i,jk}$ . For survey samples, the design-weighted composite loglikelihood is

$\hat{\ell}_P(\beta,\theta,\sigma^2) = \sum_{i=1}^{n_1} \sum_{j<k} \frac{R_{ij} R_{ik}}{\pi_{i,jk}} \ell_{i,jk}(\beta,\theta,\sigma^2)$

where $R_{ij}=1$ if unit $ij$ is sampled and $\pi_{i,jk}$ is the inclusion probability for the pair.

3. Estimating Equations and Profile Deviance

Partial derivatives of $\hat{\ell}_P$ yield composite score equations for unbiased parameter estimation. The fixed-effect score equation is

$U_\beta(\beta,\theta,\sigma^2) = \sum_{i,j<k} \frac{R_{ij}R_{ik}}{\pi_{ijk}} X_{i,jk}^T\, \Xi_{i,jk}(\theta)^{-1} r_{i,jk} / \sigma^2 = 0$

with $X_{i,jk}$ stacking $X_{ij}$ , $X_{ik}$ .

The estimator is: $\hat{\beta}_{\theta,\sigma^2} = \left(X_S^T W^{1/2} \Xi_S(\theta)^{-1} W^{1/2} X_S\right)^{-1} X_S^T W^{1/2} \Xi_S(\theta)^{-1} W^{1/2} Y_S$ where $W$ is diagonal with entries $\pi_{ijk}^{-1}$ for sampled pairs.

The $\sigma^2$ and $\theta_\ell$ scores are similarly specified, and solutions are found via profiling: $\tilde{\beta}(\theta),\ \tilde{\sigma}^2(\theta) = \arg\max_{\beta,\sigma^2} \hat{\ell}_P(\beta,\theta,\sigma^2)$ The profile deviance is: $\hat{d}(\theta) = -2 \hat{\ell}_P(\tilde{\beta}(\theta),\theta,\tilde{\sigma}^2(\theta)) = 2\hat{N}_P \log(2\pi\tilde{\sigma}^2(\theta)) + \sum_{i,j<k} \frac{R_{ij}R_{ik}}{\pi_{ijk}} \log |\Xi_{i,jk}(\theta)|$

4. Computational Implementation and Algorithm

In R, the approach is implemented in svylme via svy2lme(design, formula). The workflow includes:

Pair selection: For each sampled cluster, enumerate all observed within-cluster pairs and compute pairwise inclusion weights $\pi_{ijk}$ using available sampling probabilities or approximations if joint probabilities are unavailable.
Profiling: For each optimizer trial value $\theta$ , compute $\tilde{\beta}(\theta)$ and $\tilde{\sigma}^2(\theta)$ by generalized least squares on $2N_P$ pseudo-observations.
Optimization: Minimize the profile deviance $\hat{d}(\theta)$ over $\theta$ using Powell’s bound-constrained derivative-free optimizer BOBYQA (minqa package), with starting values from an unweighted lme4 fit (lmer).
Internal acceleration: Determinants and inverses of the $2 \times 2$ matrices $\Xi_{i,jk}$ are computed in closed form.

5. Variance Estimation and Asymptotic Properties

Under mild regularity conditions (law of large numbers and central limit theorem for weighted sums of pairwise scores), the composite-likelihood estimator $\hat{\theta} = (\hat{\beta},\hat{\theta},\hat{\sigma}^2)$ is consistent and asymptotically normal: $\sqrt{n_1}(\hat{\theta} - \theta_0) \xrightarrow{d} N(0, H^{-1} J H^{-1})$ where $H = -E[\partial U/\partial \theta^T]$ is the sensitivity matrix and $J = \operatorname{Var}(U)$ the variability matrix for the composite score $U = \sum U_{i,jk}$ . Design-based sandwich estimators are used for practical variance estimation, with

$\hat{H} = \sum_{i,j<k} \frac{R_{ij}R_{ik}}{\pi_{ijk}}\left(-\frac{\partial^2 \ell_{i,jk}}{\partial \theta \partial \theta^T}\right),\ \hat{J} = \text{PSU-level “with-replacement” estimator of Var} \left(\sum_{i,j<k} \frac{R}{\pi} U_{i,jk}\right)$

A plausible implication is that, with appropriate weighting, confidence intervals and hypothesis tests retain their design-consistency properties in large samples.

6. Efficiency, Bias, and Applicability

Simulation studies indicate that under noninformative sampling, the pairwise composite-likelihood estimator is nearly unbiased for $\beta$ , $\sigma^2$ , and $\theta$ , but generally less efficient than full maximum likelihood or stagewise pseudolikelihood estimators with proper weight-scaling. The loss of efficiency is most prominent for variance component estimation—standard errors for the random intercept variance may be two to three times larger in moderate cluster sizes—but this efficiency gap narrows as cluster sizes decrease. In the limiting case $m_i=2$ , the pairwise composite-likelihood estimator is as efficient as maximum likelihood.

Under strongly informative sampling, where probability of observation is cluster-level effect dependent, pairwise composite scores retain unbiasedness in contrast to stagewise pseudolikelihood estimators, which can be substantially biased for both fixed-effects and variance components unless clusters become very large. This robustness to informative sampling makes the approach broadly applicable in survey settings with complex or multistage design features (Lumley et al., 2023).

7. Summary and Practical Considerations

Rao’s pairwise composite likelihood framework yields a general and design-consistent estimator for two-level linear mixed models under complex survey sampling, applicable even when clusters are neither large nor within-cluster observation probabilities are uniform. The tradeoff is reduced efficiency for certain parameters, particularly variance components, in exchange for unbiasedness and robustness to survey informativeness. The method is implemented efficiently in the R svylme package, using closed-form optimizations and optimizers tailored for complex likelihood surfaces. This estimator is particularly recommended when survey designs are informative and standard methods are either inapplicable or yield biased results (Lumley et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

Linear mixed models for complex survey data: implementing and evaluating pairwise likelihood (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Covariance-Free General Ridge Estimator.