Papers
Topics
Authors
Recent
Search
2000 character limit reached

Covariance-Free Ridge Estimator in Mixed Models

Updated 28 January 2026
  • The paper introduces a design-consistent pairwise composite likelihood estimator for two-level mixed models that remains unbiased under informative sampling.
  • It leverages bivariate normal log-densities with closed-form matrix inverses to replace the full likelihood, simplifying complex survey computations.
  • Simulation studies show that while efficiency for variance components may drop in moderate clusters, the approach ensures robust, unbiased estimates.

Rao’s mixed-effects model, specifically the pairwise composite-likelihood estimation framework developed by Rao and co-workers, provides a general, design-consistent approach for fitting two-level linear mixed models in the context of complex survey data when model clusters and sampling design clusters coincide. This methodology leverages sums of bivariate normal log-densities (rather than the full likelihood), yielding estimators that remain robust under informative sampling—particularly relevant in large-scale health and social science surveys. The approach has been implemented in the R package svylme via the function svy2lme (Lumley et al., 2023).

1. Two-level Linear Mixed Model Specification

Let clusters (e.g., schools) be indexed by i=1,,n1i=1,\dots,n_1, with within-cluster units j=1,,mij=1,\dots,m_i. The classical two-level linear mixed model is: Yij=Xijβ+Zijbi+εijY_{ij} = X_{ij}\,\beta + Z_{ij}\,b_i + \varepsilon_{ij} where biN(0,G)b_i \sim N(0,G), εijN(0,σ2)\varepsilon_{ij} \sim N(0,\sigma^2), and Cov(bi,εij)=0\text{Cov}(b_i,\varepsilon_{ij}) = 0. Here, XijX_{ij} is a 1×p1 \times p vector of fixed-effect covariates, βRp\beta \in \mathbb{R}^p the fixed-effects vector; ZijZ_{ij} is a 1×q1 \times q vector of random-effect covariates, and biRqb_i \in \mathbb{R}^q the random-effects vector for cluster ii. The random-effects covariance is generally written G=σ2V(θ)G = \sigma^2 V(\theta), with θ\theta collecting variance and correlation parameters, so that the full parameter vector is θ=(β,σ2,θ)\theta^* = (\beta,\sigma^2,\theta).

The marginal distribution for the cluster response vector Yi=(Yi1,...,Yi,mi)TY_i = (Y_{i1},...,Y_{i,m_i})^T is: YiN(Xiβ,σ2Ξi(θ))Y_i \sim N(X_i \beta,\, \sigma^2\, \Xi_i(\theta)) where XiX_i is the mi×pm_i \times p covariate matrix and Ξi(θ)=Imi+ZiV(θ)ZiT\Xi_i(\theta) = I_{m_i} + Z_i V(\theta) Z_i^T.

2. Pairwise Composite Loglikelihood

Rao’s estimator replaces the full-data loglikelihood with a sum over all within-cluster bivariate log-densities. For responses (Yij,Yik)T(Y_{ij}, Y_{ik})^T in cluster ii (with j<kj < k), the joint covariance is σ2Ξi,jk(θ)\sigma^2\,\Xi_{i,jk}(\theta), where Ξi,jk(θ)\Xi_{i,jk}(\theta) denotes the 2×22 \times 2 submatrix of Ξi(θ)\Xi_i(\theta) associated with units jj and kk. The bivariate normal log-density is

fi,jk(yij,yik;β,θ,σ2)=(2πσ2)1Ξi,jk(θ)1/2exp{12σ2ri,jkTΞi,jk(θ)1ri,jk}f_{i,jk}(y_{ij},y_{ik};\beta,\theta,\sigma^2) = (2\pi \sigma^2)^{-1}|\Xi_{i,jk}(\theta)|^{-1/2} \exp\left\{ -\frac{1}{2\sigma^2} r_{i,jk}^T \Xi_{i,jk}(\theta)^{-1} r_{i,jk} \right\}

where ri,jk=(yijμij,yikμik)Tr_{i,jk} = (y_{ij}-\mu_{ij},\, y_{ik}-\mu_{ik})^T, with μij=Xijβ\mu_{ij} = X_{ij}\beta.

The (infinite-population) composite loglikelihood is

P(β,θ,σ2)=i=1N11j<kmii,jk(β,θ,σ2)\ell_P(\beta,\theta,\sigma^2) = \sum_{i=1}^{N_1} \sum_{1 \leq j < k \leq m_i} \ell_{i,jk}(\beta,\theta,\sigma^2)

with i,jk=logfi,jk\ell_{i,jk} = \log f_{i,jk}. For survey samples, the design-weighted composite loglikelihood is

^P(β,θ,σ2)=i=1n1j<kRijRikπi,jki,jk(β,θ,σ2)\hat{\ell}_P(\beta,\theta,\sigma^2) = \sum_{i=1}^{n_1} \sum_{j<k} \frac{R_{ij} R_{ik}}{\pi_{i,jk}} \ell_{i,jk}(\beta,\theta,\sigma^2)

where Rij=1R_{ij}=1 if unit ijij is sampled and πi,jk\pi_{i,jk} is the inclusion probability for the pair.

3. Estimating Equations and Profile Deviance

Partial derivatives of ^P\hat{\ell}_P yield composite score equations for unbiased parameter estimation. The fixed-effect score equation is

Uβ(β,θ,σ2)=i,j<kRijRikπijkXi,jkTΞi,jk(θ)1ri,jk/σ2=0U_\beta(\beta,\theta,\sigma^2) = \sum_{i,j<k} \frac{R_{ij}R_{ik}}{\pi_{ijk}} X_{i,jk}^T\, \Xi_{i,jk}(\theta)^{-1} r_{i,jk} / \sigma^2 = 0

with Xi,jkX_{i,jk} stacking XijX_{ij}, XikX_{ik}.

The estimator is: β^θ,σ2=(XSTW1/2ΞS(θ)1W1/2XS)1XSTW1/2ΞS(θ)1W1/2YS\hat{\beta}_{\theta,\sigma^2} = \left(X_S^T W^{1/2} \Xi_S(\theta)^{-1} W^{1/2} X_S\right)^{-1} X_S^T W^{1/2} \Xi_S(\theta)^{-1} W^{1/2} Y_S where WW is diagonal with entries πijk1\pi_{ijk}^{-1} for sampled pairs.

The σ2\sigma^2 and θ\theta_\ell scores are similarly specified, and solutions are found via profiling: β~(θ), σ~2(θ)=argmaxβ,σ2^P(β,θ,σ2)\tilde{\beta}(\theta),\ \tilde{\sigma}^2(\theta) = \arg\max_{\beta,\sigma^2} \hat{\ell}_P(\beta,\theta,\sigma^2) The profile deviance is: d^(θ)=2^P(β~(θ),θ,σ~2(θ))=2N^Plog(2πσ~2(θ))+i,j<kRijRikπijklogΞi,jk(θ)\hat{d}(\theta) = -2 \hat{\ell}_P(\tilde{\beta}(\theta),\theta,\tilde{\sigma}^2(\theta)) = 2\hat{N}_P \log(2\pi\tilde{\sigma}^2(\theta)) + \sum_{i,j<k} \frac{R_{ij}R_{ik}}{\pi_{ijk}} \log |\Xi_{i,jk}(\theta)|

4. Computational Implementation and Algorithm

In R, the approach is implemented in svylme via svy2lme(design, formula). The workflow includes:

  • Pair selection: For each sampled cluster, enumerate all observed within-cluster pairs and compute pairwise inclusion weights πijk\pi_{ijk} using available sampling probabilities or approximations if joint probabilities are unavailable.
  • Profiling: For each optimizer trial value θ\theta, compute β~(θ)\tilde{\beta}(\theta) and σ~2(θ)\tilde{\sigma}^2(\theta) by generalized least squares on 2NP2N_P pseudo-observations.
  • Optimization: Minimize the profile deviance d^(θ)\hat{d}(\theta) over θ\theta using Powell’s bound-constrained derivative-free optimizer BOBYQA (minqa package), with starting values from an unweighted lme4 fit (lmer).
  • Internal acceleration: Determinants and inverses of the 2×22 \times 2 matrices Ξi,jk\Xi_{i,jk} are computed in closed form.

5. Variance Estimation and Asymptotic Properties

Under mild regularity conditions (law of large numbers and central limit theorem for weighted sums of pairwise scores), the composite-likelihood estimator θ^=(β^,θ^,σ^2)\hat{\theta} = (\hat{\beta},\hat{\theta},\hat{\sigma}^2) is consistent and asymptotically normal: n1(θ^θ0)dN(0,H1JH1)\sqrt{n_1}(\hat{\theta} - \theta_0) \xrightarrow{d} N(0, H^{-1} J H^{-1}) where H=E[U/θT]H = -E[\partial U/\partial \theta^T] is the sensitivity matrix and J=Var(U)J = \operatorname{Var}(U) the variability matrix for the composite score U=Ui,jkU = \sum U_{i,jk}. Design-based sandwich estimators are used for practical variance estimation, with

H^=i,j<kRijRikπijk(2i,jkθθT), J^=PSU-level “with-replacement” estimator of Var(i,j<kRπUi,jk)\hat{H} = \sum_{i,j<k} \frac{R_{ij}R_{ik}}{\pi_{ijk}}\left(-\frac{\partial^2 \ell_{i,jk}}{\partial \theta \partial \theta^T}\right),\ \hat{J} = \text{PSU-level “with-replacement” estimator of Var} \left(\sum_{i,j<k} \frac{R}{\pi} U_{i,jk}\right)

A plausible implication is that, with appropriate weighting, confidence intervals and hypothesis tests retain their design-consistency properties in large samples.

6. Efficiency, Bias, and Applicability

Simulation studies indicate that under noninformative sampling, the pairwise composite-likelihood estimator is nearly unbiased for β\beta, σ2\sigma^2, and θ\theta, but generally less efficient than full maximum likelihood or stagewise pseudolikelihood estimators with proper weight-scaling. The loss of efficiency is most prominent for variance component estimation—standard errors for the random intercept variance may be two to three times larger in moderate cluster sizes—but this efficiency gap narrows as cluster sizes decrease. In the limiting case mi=2m_i=2, the pairwise composite-likelihood estimator is as efficient as maximum likelihood.

Under strongly informative sampling, where probability of observation is cluster-level effect dependent, pairwise composite scores retain unbiasedness in contrast to stagewise pseudolikelihood estimators, which can be substantially biased for both fixed-effects and variance components unless clusters become very large. This robustness to informative sampling makes the approach broadly applicable in survey settings with complex or multistage design features (Lumley et al., 2023).

7. Summary and Practical Considerations

Rao’s pairwise composite likelihood framework yields a general and design-consistent estimator for two-level linear mixed models under complex survey sampling, applicable even when clusters are neither large nor within-cluster observation probabilities are uniform. The tradeoff is reduced efficiency for certain parameters, particularly variance components, in exchange for unbiasedness and robustness to survey informativeness. The method is implemented efficiently in the R svylme package, using closed-form optimizations and optimizers tailored for complex likelihood surfaces. This estimator is particularly recommended when survey designs are informative and standard methods are either inapplicable or yield biased results (Lumley et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Covariance-Free General Ridge Estimator.