- The paper develops a randomization inference method for estimating the average treatment effect among always-reporters under monotone attrition in RCTs.
- It introduces a worst-case testing approach that maximizes p-values over feasible always-reporter configurations using integer programming strategies.
- Empirical simulations confirm correct size and high power, demonstrating both finite-sample validity under the sharp null and asymptotic control under weak nulls.
Randomization Inference for the Always-Reporter Average Treatment Effect: A Technical Essay
Introduction
This paper develops a finite-sample valid randomization-based inference framework for the average treatment effect (ATE) among always-reporters (AR-ATE) in randomized controlled trials (RCTs) with attrition. Under monotone response selection (treatment can only weakly increase reporting probability), the AR-ATE is partially identified—being the effect on units who would report regardless of assignment. This estimation problem is fundamental in field and clinical experiments, political economy, and policy evaluation, where missing outcome data governed by treatment-induced nonresponse is typical. The contribution is to provide randomization-based hypothesis testing for sharp and weak nulls on the AR-ATE, with algorithms that maximize statistical validity under finite samples and weak identifiability.
Problem Setup and Identification
The key setting involves n randomized units, each with a pair of observed variables: the observed treatment assignment Di​ and post-assignment outcome reporting indicator Ri​. The monotonicity assumption ri​(1)≥ri​(0) (no unit reports under control but not under treatment) induces three principal strata: always-reporters, if-reporters, and never-reporters, as in principal stratification.
The target parameter is the average treatment effect among always-reporters: AR-ATE=∣A∣1​i∈A∑​[yi​(1)−yi​(0)],
where A is the set of units who would report under both treatment and control.
Standard identification results—building on Lee [lee2009training]—establish that:
- The share of always-reporters is identified by the reporting rate in controls.
- The mean potential control outcome for always-reporters is point-identified by observed reporters in controls.
- The mean treated outcome for always-reporters is only bounded, via monotonicity and trimming arguments.
However, always-reporter status is only partially revealed by the data. Thus, inference for AR-ATE, and even for the sharp null yi​(1)=yi​(0) over always-reporters, is nontrivial.
Randomization-Based Inference: Methodology
If always-reporter status were observed, randomization inference for the AR-ATE (e.g., permuting assignments, evaluating difference-of-means in the always-reporter subset) would be straightforward. The challenge arises from the fact that in any observed data, always-reporters in the treated group are only a latent subset of observed reporters.
Worst-Case Approach
The core methodological innovation is a "worst-case" randomization test. For any compatible labeling of always-reporters consistent with observed Di​ and Ri​, a randomization p-value can be computed as if the true truth table were known. The overall procedure is then:
- Take the maximum Di​0-value across all feasible always-reporter configurations compatible with the observed data.
- Optionally, prune implausible always-reporter sets using a pretest on the marginal balance of always-reporters across treatment arms.
The finite-sample validity of the resulting test immediately follows from these two properties:
- Under the sharp null, the randomization distribution is uniform in finite samples, even when maximized over configurations.
- Under weak nulls (the average effect among always-reporters is zero, but heterogeneity is allowed), asymptotic control is established using technical analysis of the joint statistics.
Test Statistics and Implementation
The main statistics used are:
- Studentized Hajek-type difference-in-means (with unknown, potentially unbalanced denominators).
- Joint Di​1-type statistics testing both mean balance and the balance in the number of always-reporters.
For discrete outcomes, symmetry and the small number of support points allow exhaustive enumeration; for continuous outcomes, the p-value computation is relaxed to a sequence of integer programming (IP) problems, with variable assignments over the always-reporter matching variables.
For computational tractability, the search is efficiently organized by principal strata logic, and the algorithms are designed to exploit problem symmetries—such as labeling invariance among observed treated reporters.
Statistical Validity and Theoretical Guarantees
Finite-sample validity is proven for the sharp null, irrespective of the composition of the finite population or the outcome distribution [(2603.24970), aronow2024randomization]. This is achieved via complete randomization (CR), regardless of the actual always-reporter labeling (because the maximization is over all data-compatibles).
Asymptotic validity for the weak null is established using combinatorial CLT machinery and recent Berry–Esseen bounds for conditional randomization [shi2022berry, li2017general]. The critical technical feature is that the procedure is uniformly valid even when the number of always-reporters is nearly the population size (unlike classical Lee/Imbens bounds, which rely on limiting shares of each stratum being strongly away from 0/1).
The analysis also accommodates non-regular cases, with the asymptotic null distribution of the test statistic potentially degenerate depending on the sequence of asymptotic regimes for the principal strata shares. Asymptotic approximation (i.e., using normal/chi-square critical values) is justified, and its limits carefully characterized.
Pretesting/pruning via balance of always-reporters (Berger–Boos) is shown to reduce computational burden without compromising statistical validity, provided that significance level adjustment is done correctly.
Computational Aspects and Algorithms
The randomization inference problem is inherently computationally challenging, since, in the worst case, the feasible set of always-reporter configurations grows exponentially with sample size (as Di​2, where Di​3 is treated reporters). For binary or categorical outcomes, the problem reduces to grouping by support points and can be solved in polynomial time in Di​4 (given small support). In the continuous case, the integer programming relaxation solves the maximization over feasible always-reporter matchings for each Monte Carlo permutation and outcome configuration efficiently.
In practice, the approach deploys heuristic lower bounding in a first stage (for potential rejection), and only continues to exact enumeration or IP computation if the decision boundary is close.
Simulations confirm theoretical size and power properties. For sample size Di​5 (with Di​6 treatment assignments), correct test size is maintained (empirical rejection under null 0.002 at Di​7). Power under a unit effect (Di​8) is 0.922. The runtime is manageable (median seconds to hours), with the main computational burden arising in marginal rejection cases.
Implications, Extensions, and Future Directions
This work rigorously establishes a design-based, randomization-inference protocol for AR-ATE under monotone attrition in RCTs. Strong claims include finite-sample level control under the sharp null, and asymptotic control of size under weak nulls in all principal strata regimes (including near-degenerate cases).
The implications are significant where attrition or missing data may confound classical (i.e., intention-to-treat or per-protocol) analyses. The method is directly extensible to settings with more nuanced principal stratification but requires extension for settings with covariate-dependent monotonicity (as in "Generalized Lee Bounds" [semenova2025generalized]) or complex assignment mechanisms beyond complete randomization.
Algorithmically, future work should address further scalability (possibly through sharper relaxations or approximations), especially for larger-scale trials and/or under richer outcome supports. Theoretical extension to randomized assignment with stratification or re-randomization procedures is natural.
Conclusion
The paper delivers a rigorous and computationally formulated framework for randomization inference on the always-reporter ATE, under monotonic sample selection. It advances the literature by guaranteeing level and asymptotic validity, addressing computational issues for general outcomes, and providing efficient tests with guaranteed coverage in finite populations—offering a template for principled missing-data adjusted inference in modern experimental designs.