Design-based Hájek estimation for clustered and stratified experiments (2406.10473v3)
Abstract: Random allocation is essential for causal inference, but practical constraints often require assigning participants in clusters. They may be stratified pre-assignment, either of necessity or to reduce differences between treatment and control groups; but combining clustered assignment with blocking into pairs, triples, or other fine strata makes otherwise equivalent estimators perform quite differently. The two-way ANOVA with block effects can be inconsistent, as can another popular, seemingly innocuous estimator. In contrast, H\'ajek estimation remains broadly consistent for sample average treatment effects, but lacks a design-based standard error applicable with clusters and fine strata. To fill this gap, we offer a new variance estimator and establish its consistency. Analytic and simulation results recommend a hybrid of it and Neyman's estimator for designs with both small and large strata. We extend the H\'ajek estimator to accommodate covariates and adapt variance estimators to inherit Neyman-style conservativeness, at least for hypothesis testing. Further simulations suggest that with heterogeneous treatment effects, our combination of novelties is necessary and sufficient to maintain coverage in small-$n$ designs; the relevant $n$ being that of clusters, many large-scale studies are small-$n$. We consider two: a paired, aggregate-data nutritional study and an education study with student covariates and varying block sizes.