Regression Discontinuity Design (RDD)
- Regression Discontinuity Design (RDD) is a quasi-experimental method that infers causal effects by comparing outcomes near a known threshold in a continuous assignment variable.
- Recent extensions of RDD, such as the R3D framework, enable analysis of distribution-valued outcomes using advanced tools like Bayesian nonparametrics and Fréchet regression.
- Empirical applications, including political effects on income distributions, validate RDD methodologies by revealing heterogeneous impacts across outcome quantiles and ensuring robust inference through simulation studies.
Regression Discontinuity Design (RDD) is a quasi-experimental framework for estimating causal effects when treatment assignment is governed by a threshold in a continuous assignment (or running) variable. Canonical RDD infers the causal effect by comparing outcomes for units just above and below a known cutoff, which is presumed to render treatment and control groups locally exchangeable. Recent advances have generalized RDD to encompass complex assignment mechanisms, outcome types, interference between units, and high-dimensional or distribution-valued outcomes, using a diverse suite of methodological tools including Bayesian nonparametrics, hierarchical and semiparametric models, and functional regression.
1. Identification and Causal Estimands in RDD
Standard RDD identification hinges on the continuity of potential outcome regression functions at the cutoff. Let and denote the potential outcomes for treated and control respectively, be the assignment variable, and the cutoff. In sharp RDD, the treatment indicator deterministically assigns treatment. The causal estimand, or local average treatment effect at the cutoff, is
Fuzzy RDD relaxes deterministic assignment; treatment probability jumps at but with possible noncompliance, and the causal estimand becomes a ratio of outcome and treatment discontinuities: The identification in these frameworks relies on assumptions about the assignment process—continuity or local randomization—and, in fuzzy designs, monotonicity or instrument strength.
2. RDD with Distribution-Valued Outcomes ("R3D")
Traditional RDD methods presume scalar or vector-valued outcomes for each unit. The "R3D" framework (Dijcke, 4 Apr 2025) extends this to the setting where each sample unit possesses a distribution-valued outcome, typical of hierarchical structures: e.g., treatment is assigned at the firm level by crossing a revenue cutoff, but outcomes are employee wage distributions.
The target estimand is the local average quantile treatment effect (LAQTE): where and are the th quantiles of the treated and control distributions, and is the normalized cutoff. This estimand generalizes the classical RDD jump to the space of quantile functions and by extension to the 2-Wasserstein metric space, allowing inference on distributional causal effects beyond the mean or variance.
3. Estimation Methodologies for R3D
The core estimation problem is the local regression of random quantile functions on the assignment variable near the cutoff.
- Local Polynomial Estimation for Quantile Functions: For each quantile , perform local polynomial regression of the unit-level quantile functions versus , using kernel weights concentrated near the cutoff and one-sided polynomial bases on either side:
Here, are local polynomial regression weights for the left/right neighborhoods.
- Local Fréchet Regression (Wasserstein Barycenter Projection): To ensure the estimated conditional quantile function is valid (monotonic and continuous), the local polynomial estimator is projected onto the convex set of quantile functions:
In 2-Wasserstein space (the Wasserstein metric for probability distributions), this is equivalent to estimation of the conditional Wasserstein barycenter, leveraging the Fréchet mean structure of quantile functions.
- Uniform Bahadur Representation and Asymptotics: Both estimators admit a uniform Bahadur expansion, facilitating the derivation of the asymptotic joint normality of the estimator process:
with a multiplier bootstrap yielding uniform, bias-corrected confidence bands.
- Automatic Bandwidth Selection: For optimal MSE, bandwidths are selected via a plug-in formula based on integrated bias and variance expansions, and Fréchet regression further permits a single, global choice by averaging over quantiles.
4. Simulation Validation
Monte Carlo experiments validate the theoretical properties of R3D estimators:
- Inconsistency of Scalar Quantile-RD in Distributional Settings: Applying standard quantile RD methods (which estimate discontinuities in scalar covariates) in settings where outcomes are distributions induces substantial bias and inconsistency.
- Bias and Variance of R3D Estimators: Both the direct local polynomial and the Fréchet projection approaches achieve consistently lower (often order-of-magnitude) bias and MSE compared to classical estimators; Fréchet regression, in particular, shows improved finite-sample performance due to global quantile function constraints.
- Coverage Properties: Uniform confidence bands constructed via the multiplier bootstrap recover nominal levels rapidly even in moderate samples, validating inferential accuracy in the high-dimensional function space.
5. Empirical Application: Political Effects on Income Distributions
The R3D methodology is applied to study the effects of U.S. gubernatorial party control on the within-state income distribution in a close-election (RDD) design.
- Design: The running variable is Democratic vote share, with the cutoff at 50%. For each state and year, the outcome is the full family income distribution.
- Findings: The R3D procedure reveals that Democratic governorship shifts the income distribution by reducing upper quantiles (notably, the top decile), with little impact or slight increases at the lower and median quantiles—a distributional pattern consistent with classical equality-efficiency tradeoff dynamics.
- Interpretation: The average treatment effect on means, as estimated by aggregate RDD, matches the average of the R3D quantile curve, but the full R3D analysis provides a granular view of the heterogeneous impact across the distribution.
6. Mathematical Formalism
Key mathematical ingredients of R3D estimators include:
| Component | Expression | Description |
|---|---|---|
| Local Average Quantile Treatment Effect | Target distributional jump at cutoff | |
| Weighted Local Polynomial Estimator | Side-specific weighted regression | |
| Local Fréchet Projection | Ensures valid quantile function | |
| Wasserstein Distance | Distance in space of distributions | |
| IMSE-optimal Bandwidth (for ) | Data-driven bandwidth |
7. Extensions and Implications
The R3D framework significantly advances RDD methodology by:
- Enabling causal inference when outcomes are themselves distributions rather than scalars, typical in hierarchical or multi-level designs.
- Providing estimators with rigorous theoretical properties (asymptotic normality, valid uniform confidence bands).
- Enabling function-valued (as opposed to mean or quantile-by-quantile) treatment effect inference, revealing rich heterogeneity in causal impacts across the outcome distribution.
- Validating methods through both simulation and applied settings, uncovering policy-relevant patterns (e.g., the equality–efficiency tradeoff in governmental income interventions).
This work provides a foundational approach for generalizing RDD analysis to distributional outcomes, with applications spanning labor, policy, and education economics where such structures frequently arise (Dijcke, 4 Apr 2025).