Multilevel Regression Discontinuity Models with Latent Variables

Published 4 Apr 2026 in stat.ME | (2604.03535v1)

Abstract: Regression discontinuity (RD) analysis with latent variables as introduced by Morell et al. (2025), offers a useful augmentation of the conventional RD by incorporating measurement model. This approach is particularly relevant in education research, where noisy proxy (e.g., observed test score) of underlying latent construct is adopted for the running variable. This extension enables extrapolation of average treatment effect (ATE) away from the cutoff score and assessment of heterogeneous treatment effects. However, a key limitation of the original framework is its single-level structure, which does not account for the multilevel structure commonly found in education data, such as students nested within classrooms or schools. In this study, we extend the framework to multilevel contexts. We discuss models for both hierarchical RD design, where treatment is assigned at the cluster level, and multisite RD design, where treatment is assigned at the individual level within clusters. In both cases, multilevel measurement model is incorporated to describe the relationship between the latent running variable and observed indicators. Monte Carlo simulations demonstrate recovery of ATEs including extrapolated estimates beyond the cutoff given adequate cluster-level sample sizes. The study highlights the applicability of RD analysis with latent variables for broader use in educational research, without being restricted by the limitations of multilevel data.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper develops a unified framework that augments standard regression discontinuity analysis by integrating latent variable models with multilevel data structures.
It employs the Metropolis-Hastings Robbins-Monro algorithm to efficiently estimate parameters and average treatment effects across hierarchical and multisite designs.
Simulation studies demonstrate minimal bias and robust ATE estimation near the cutoff, highlighting the framework's practical benefits in settings with noisy assignment variables.

Multilevel Regression Discontinuity Models with Latent Variables

Introduction

This work, "Multilevel Regression Discontinuity Models with Latent Variables" (2604.03535), develops a unified statistical framework that augments regression discontinuity (RD) analysis with both latent variable (LV) modeling and multilevel data structures. Conventional RD identifies causal effects at sharp assignment thresholds, typically assuming error-free proxy measures for assignment (the running variable, RV) and often ignoring nested structures prevalent in educational data. Recent advances introduced RD with latent variables to address noisy proxies but were limited to single-level contexts. This paper extends the latent RD paradigm to hierarchical (HRD) and multisite (MRD) experimental settings, integrating latent measurement and multilevel variance components. The proposed approach generalizes RD inferences, permitting robust estimation and extrapolation of the average treatment effect (ATE) and assessment of treatment effect heterogeneity in clustered data with noisy assignment variables.

Framework and Model Specification

Multilevel Data and RD Assignment Mechanisms

The analysis delineates two major multilevel RD designs:

Hierarchical RD (HRD): Treatment assigned at the cluster level (e.g., all students within a school receive the program if the school's mean observed RV falls below a cutoff), with outcomes observed at the individual level.
Multisite RD (MRD): Treatment assigned at the individual level (based on within-individual observed RV), but individuals are nested within clusters due to the sampling scheme.

Both designs necessitate modeling the hierarchical error structure to address intra-cluster correlation.

Latent Running Variable and Measurement Model

A distinguishing feature is the explicit modeling of the latent running variable (LRV) using an item response theory (IRT) measurement framework. The observed proxy (ORV, such as a summed test score) is treated as a noisy indicator of the LRV. Each individual’s LRV is decomposed into a cluster effect and an individual deviation, with observed indicators modeled via a 2-parameter logistic (2PL) item model. This construction enables unbiased recovery of treatment effects, corrected for classical measurement error, and facilitates treatment effect estimation conditional on latent constructs rather than error-laden observable proxies.

Structural Model Formulations

Both HRD and MRD structural models incorporate random effects to represent unobserved heterogeneity attributable to clustering. Key features include:

Interaction terms to capture effect modification by the LRV
Fixed and random slopes/intercepts capturing between- and within-cluster variance in both potential outcomes and treatment effects
Potential outcomes notation reflecting the causal estimands under the Rubin causal model

Causal Estimation: ATEs and Heterogeneity

A major conceptual advance is distinguishing estimands conditional on ORV and on LRV. Classical RD only identifies the LATE at the cutoff of the observed running variable. Here, the framework enables estimation of:

ATE conditional on the LRV: Reflects the treatment contrast for a specific value of the latent trait, not just at the cutoff.
ATE conditional on ORV: Derived by integrating the LRV over its posterior distribution given observed data, accounting for measurement noise.

This facilitates:

Extrapolation of treatment effects away from the cutoff, justified by nonzero overlap of LRV distributions due to measurement error (see Figure 1).
Figure 1: Schematic showing effect identification at the cutoff for the ORV (Panel A), the distribution of LRV conditional on ORV (Panel B), and effect estimation across the LRV spectrum (Panel C).
Quantification of treatment effect heterogeneity—variance and quantiles of the ATE among units sharing the same ORV—using analytic posterior moments under the assumed models.

Estimation via the MH-RM Algorithm

Parameter estimation is accomplished using the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm, offering scalable estimation in the presence of multiple random effects and latent variables. The procedure combines stochastic imputation from the model’s joint posterior, efficient gradient approximation, and stochastic approximations for parameter updates. This is particularly advantageous over classical quadrature when the dimension of the latent space is high (e.g., >3). Standard errors are derived using Louis’ formula, approximated via posterior sampling, enabling correct uncertainty quantification for parameter and ATE inference.

Simulation Study

A comprehensive simulation study evaluates the fidelity of ATE recovery and parameter estimation under varying sample sizes, cluster structures, and effect sizes. Summary findings include:

HRD Model:
- Structural parameter bias is minimal and RMSEs decrease with increased cluster count (Figure 2).
- ATE estimation is accurate at and near the cutoff ( $c \pm 1$ ), with coverage of nominal 95% CIs improving with more clusters (Figure 3).
- Estimation error increases farther from the cutoff, reflecting the attenuation of identification due to reduced overlap in LRV.
- Figure 2: Bias and RMSE for key HRD structural parameters, as a function of number of clusters.
- Figure 3: Bias, RMSE, and CI coverage for ATE estimation at various ORV values under the HRD model as cluster count varies.
MRD Model:
- Parameter recovery is superior to HRD, with smaller RMSE, attributed to higher effective sample size in individual-level treatment assignment (Figure 4).
- ATE estimation is precise across a broad ORV spectrum ( $c \pm 2$ ), with stable coverage properties (Figure 5).
- Figure 4: Bias and RMSE for MRD structural parameters as number of clusters increases.
- Figure 5: Bias, RMSE, and CI coverage for ATE estimation at various ORV values under the MRD model.
Measurement Parameter Estimation:
- The means of estimated measurement parameters closely match true values in both HRD and MRD settings, affirming the robustness of inference to the latent measurement component (Figures 6 and 7).
- Figure 6: Estimated versus true measurement parameters in the HRD model.
- Figure 7: Estimated versus true measurement parameters in the MRD model.

Practical and Theoretical Implications

This framework substantially broadens RD analysis for education and social sciences by allowing:

Valid causal inference in complex multilevel structures where treatment assignment and data collection are hierarchically organized.
Correction for measurement error in assignment variables, avoiding bias and enhancing interpretability of treatment effect estimates.
Extrapolation of ATEs and quantification of heterogeneity: Directly addresses decision-making needs beyond the cutoff, such as policy implications for alternative assignment thresholds.
Flexibility: Model can be extended to include additional covariates—observed or latent—and more complex latent constructs (e.g., multidimensional LVs, multiple assignment variables).

Numerical results indicate that with sufficient cluster-level sample sizes, finite sample bias is negligible and interval coverage is reliable. The method’s performance degrades as extrapolation moves far from the cutoff, emphasizing the substantive limitation on inference range imposed by RD design and measurement error model assumptions.

Future Directions

Potential extensions include generalization to models with unbalanced clusters, variable intraclass correlation (ICC), and test length heterogeneity, reflecting more realistic operational settings in education research. Employment of multidimensional latent measurement frameworks, covariate adjustment, and multidimensional assignment rules (e.g., via multiple eligibility criteria) are direct extensions that would enhance utility. More generally, the integration of this framework with Bayesian hierarchical approaches, leveraging modern MCMC or variational inference, promises further improvements in estimation under highly complex designs.

Conclusion

This study establishes a comprehensive methodology for multilevel RD with latent running variables, enabling unbiased and efficient causal inferences in clustered, error-prone assignment contexts ubiquitous in education research. The combination of flexible modeling, robust estimation, and analytic tools for effect heterogeneity and extrapolation marks a significant methodological contribution. Future research should address additional layers of complexity and practical deployment in large-scale evaluative studies.

Markdown Report Issue