Longitudinal Measurement Invariance in IRT
- Longitudinal Measurement Invariance is a property verifying that latent measurement scales remain consistent over time and across groups, ensuring true change detection.
- It employs continuous-time latent process models and graded response IRT to map dynamic latent trajectories to ordinal responses with precision.
- Statistical tests, particularly likelihood-ratio tests, distinguish invariant from non-invariant items, guiding adjustments for differential item functioning.
Longitudinal Measurement Invariance (LMI) is a critical property in the analysis of repeated-item constructs, ensuring that observed measurement changes over time or between groups reflect true changes in the latent traitārather than artifacts of the measurement process. In the context of continuous-time longitudinal Item Response Theory (IRT), as developed by Proust-Lima et al. (2021), LMI is rigorously defined, parametrized, and tested within a unified statistical framework that accommodates arbitrary observation schedules and complex item-level dynamics (Proust-Lima et al., 2021).
1. Continuous-Time Latent Process Model
Longitudinal IRT measurement structures rely on modeling a subject-specific, continuous-time latent process , which underpins all item responses for subject at time . The structural model is given by:
where and are, respectively, the fixed- and random-effect design vectors at time for individual . This formulation allows for subject-specific, time-varying latent trajectories without requiring equal measurement intervals. Identifiability is ensured by constraining . The finite-dimensional random effects absorb serial correlation; additional Gaussian processes, while possible, are not required for standard applications in this framework.
2. Item Response Model and Measurement Equation
The mapping from the latent process to observed ordinal responses is handled via a graded response model; both logistic and probit links are supported, with the probit specification used in the referenced implementation. For an item 0 with 1 ordered categories, the probit formulation at time 2 is:
3
where 4 encodes discrimination, and 5 are ordered category thresholds per item. This approach enables direct recovery of category probabilities for the full response distribution at each occasion.
3. Parametrization and Meaning of Longitudinal Measurement Invariance
In longitudinal data, measurement invariance concerns whether the item parameters remain stable over time and/or across groups. There are several hierarchical levels of invariance:
- Configural invariance: All items load on the same latent factor 6, with no further parameter constraints.
- Weak (metric) invariance: Discrimination parameters (7 or equivalently 8) are fixed across occasions/groups.
- Strong (scalar) invariance: Both discriminations and all thresholds or intercepts (9 or 0) are time- and group-invariant.
- Strict invariance: Residual variances (1) are additionally held constant across time or group.
Lack of invariance is operationalized as Differential Item Functioning (DIF) or response shift, requiring augmentation of the measurement model with explicit time- or group-dependent item-level contrasts:
2
for group DIF with group indicator 3, or
4
for time-varying DIF (response shift) using spline basis functions 5. Setting all 6 contrasts to zero restores strong invariance for that item.
4. Statistical Testing for Loss of Invariance and DIF
Testing for LMI proceeds via likelihood-ratio tests (LRT) of nested models:
- Invariant model (7): all contrast parameters (8) set to zero.
- Non-invariant model (9): subset of 0 parameters freed.
The test statistic is
1
where 2 is the number of newly freed parameters. Separate procedures exist for global (all items simultaneously) and item-wise testing (individual item or parameter contrasts). The decision process emphasizes sequential testing: retain full invariance if the global test is nonsignificant, else isolate non-invariant items and re-specify the measurement structure accordingly.
5. Maximum Likelihood Estimation and Computational Implementation
The entire model is estimated using maximum likelihood with quasi-Monte Carlo integration (Halton or Sobol sequences, typically 1000 nodes) for the random-effect integrals. This enhances computational efficiency and convergence compared to simple Monte Carlo. Optimization employs a MarquardtāLevenberg algorithm with adaptive damping, with convergence assessed on parameter changes, log-likelihood, and gradients. Standard errors are derived from the observed Hessian (or via the delta method for derived quantities). The method is implemented in the R package lcmm (function multlcmm), with intrinsic support for unequally spaced and irregular measurement times.
6. Empirical Illustration and Applied Recommendations
The PREDIALA study exemplifies LMI methodology: Seven HADS-Depression items were modeled for patients on a renal transplant waiting list. A strongly invariant model (no group or time DIF) was fit initially. A global LRT for group DIF was nonsignificant (3), but item-wise examination revealed significant DIF in a single item. Partial invariance was introduced by freeing only this item's threshold by group, which removed spurious group-level latent mean differences. Further testing for response shift using splines yielded nonsignificant results (4), supporting overall invariance.
Best practices include:
- Fit a fully invariant model to establish latent scale.
- Test globally for group DIF and response shift.
- If significant, identify and free only non-invariant item parameters.
- Re-evaluate substantive analyses (trajectories, effects) under the updated measurement model.
- Maintain flexibility between weak and strong invariance depending on the stability of discrimination and threshold parameters.
- Recognize threshold DIF as a situation where only item category thresholds vary, while discriminations remain stable.
7. Implications and Scope of the Continuous-Time LMI Framework
The continuous-time longitudinal IRT model formulated by Proust-Lima et al. provides a unified, fully likelihood-based approach for modeling latent trajectories, accommodating ordinal item data, arbitrary observation designs, and explicit LMI testing. This yields robust interpretations of latent construct evolution, ensuring observed effects reflect true latent changes and not artifacts of measurement drift or inconsistent item functioning. Once a final (partial) invariance model is specified, the resulting latent process estimates faithfully represent the dynamics of the underlying construct of interest, unconfounded by temporal or group-induced variation in item interpretation (Proust-Lima et al., 2021).