Fuzzy Regression Discontinuity with Covariates

Updated 8 February 2026

Fuzzy regression discontinuity designs with covariates are methods that adjust for imperfect treatment compliance by leveraging pre-treatment variables.
The approach estimates the local average treatment effect near a cutoff using techniques like local polynomial regression and two-stage least squares, ensuring robust causal identification.
Efficiency gains and heterogeneous effects are addressed through optimal bandwidth selection and weighting strategies to enhance precision in empirical applications.

A fuzzy regression discontinuity (FRD) design with covariates generalizes the canonical RD setup to account for imperfect compliance and allows for adjustment using observed pre-treatment variables. In this framework, treatment assignment at a threshold in a running variable is used as an instrument for actual treatment receipt, and covariates are incorporated to improve estimation precision, account for heterogeneity, or recover causal parameters under broader conditions. The integration of covariate information expands both the identification strategy and estimation procedure, but requires careful attention to the role of these covariates in the design.

1. Definition and Conceptual Framework

A fuzzy RD design distinguishes between a deterministic assignment variable (instrument) and actual treatment take-up, accommodating imperfect compliance. With covariate adjustment, the model is specified for observed data $(Y_i, D_i, X_i, W_i)$ , where $Y_i$ is the outcome, $D_i \in \{0, 1\}$ is the treatment, $X_i$ is the running variable with cutoff $c$ , and $W_i$ is a vector of pre-treatment covariates. The treatment assignment indicator is $Z_i = \mathbf{1}\{X_i \ge c\}$ . The causal estimand is the local average treatment effect (LATE) at the cutoff, identified as

$\tau_{\rm FRD}(c) = \frac{\lim_{x\downarrow c} E[Y_i|X_i=x, W_i] - \lim_{x\uparrow c} E[Y_i|X_i=x, W_i]}{\lim_{x\downarrow c} E[D_i|X_i=x, W_i] - \lim_{x\uparrow c} E[D_i|X_i=x, W_i]}$

Conditional expectations are typically estimated locally around the cutoff, with covariate adjustment entering additively or via cell-specific estimation (Cattaneo et al., 18 Jul 2025, Cattaneo et al., 2023).

2. Identification Assumptions

Identification of the FRD LATE with covariates hinges on several key conditions:

Continuity of Conditional Means: For each value of the covariates, the conditional potential outcome and treatment functions must be continuous at the cutoff (Cattaneo et al., 2023).
First-Stage Relevance: There must be a nonzero jump in the conditional expectation of treatment at the cutoff, i.e., $\Delta_D \neq 0$ .
Covariate Continuity and Exclusion: Pre-treatment covariates should not exhibit discontinuities at the cutoff (no sorting), and their effect should be equal above and below the cutoff to prevent introduction of artificial jumps. This ensures additive inclusion does not change the estimand (Cattaneo et al., 18 Jul 2025).
Monotonicity and Exclusion Restrictions: Standard IV assumptions apply—no defiers (treatment monotonicity), and assignment affects outcomes only via actual treatment receipt.

Violation of covariate continuity—such as via sorting or self-selection—necessitates weighting or estimation strategies that address imbalances, such as the inverse probability weighted local linear methods described in (Peng et al., 2019).

3. Estimation Methodologies

Local Linear/Polynomial and 2SLS Estimation

The dominant estimation strategies for FRD with covariates are:

Local Polynomial Regressions: Separate local linear regressions for $Y_i$ and $D_i$ on either side of the cutoff, incorporating $W_i$ additively (Cattaneo et al., 2023). The estimated discontinuities are combined via the Wald ratio. Bandwidth selection is crucial for bias-variance tradeoff and typically employs MSE- or coverage-optimal rules.
Local Instrumental Variables (2SLS): A two-stage setup where the assignment indicator $Z_i$ serves as the local instrument for $D_i$ , with both stages including $W_i$ additively (and optionally interactions for heterogeneity) (Cattaneo et al., 18 Jul 2025):

$\text{First stage: } D_i = \alpha + \pi Z_i + \rho (X_i - c) + \kappa' W_i + u_i$

$\text{Second stage: } Y_i = \beta + \tau \widehat{D}_i + \gamma (X_i - c) + \gamma' W_i + \varepsilon_i$

Additive Covariate Adjustment in Kernels: For Bayesian or nonparametric approaches, kernels can be specified as sums or products over $x$ and $w$ components (e.g., squared-exponential for $x$ , linear/polynomial for $w$ ) (Wu, 2021).

Gaussian Process and Deep Learning Extensions

A nonparametric Bayesian alternative is offered by hierarchical Gaussian process models in which a neural network transforms the input $(x,w)$ into features, and a GP is then fit to these features. Covariates are included additively in the kernel or via feature expansion, yielding a Bayesian analog of covariate adjustment. These approaches afford flexible modeling of heterogeneity and global patterns without explicit specification (Wu, 2021).

Covariate Categorization and Weighted Estimands

With discrete covariates, cell-specific jump estimation enables identification of a broader class of weighted average LATEs (WLATEs). The Compliance Weighted LATE (CWLATE) maximizes statistical power by weighting cell-specific effects by the squared first-stage jump: $\tau_{CW} = \frac{E[\Delta D(Z) \Delta Y(Z)]}{E[\Delta D(Z)^2]}$ Cells with stronger compliance exert greater influence, improving estimator efficiency especially under heterogeneous compliance (Caetano et al., 1 Feb 2026).

4. Efficiency, Inference, and Bandwidth Selection

Adjustment for covariates reduces variance by explaining additional outcome (and/or first-stage) variability. The main sources of efficiency gain are high predictive power of the covariates for $Y_i$ and $D_i$ near the cutoff. Robust bias-corrected confidence intervals for the estimated LATE, centered at bias-corrected estimators and using robust variance formulas, are available in the literature and implemented in packages like rdrobust (Cattaneo et al., 2023, Cattaneo et al., 18 Jul 2025, Caetano et al., 1 Feb 2026).

Bandwidth selection for both local polynomial and CWLATE estimators demands careful optimization (typically MSE-minimizing), with separate pilot and estimation bandwidths for point estimation and bias-correction (Caetano et al., 1 Feb 2026).

5. Heterogeneous Effects and Weighted LATEs

If treatment effect heterogeneity is suspected or of substantive interest, interactions of $W_i$ with treatment or instrument are incorporated in both stages, targeting conditional LATEs specific to cells or values of the covariates (Cattaneo et al., 18 Jul 2025). In discrete settings, the FRD design point-identifies a continuum of WLATEs indexed by nonnegative weight functions over covariate cells; CWLATE is a special case targeting high-compliance cells (Caetano et al., 1 Feb 2026).

The table below summarizes major estimands and their weighting structures:

Estimand	Weight Structure	Efficiency Targeted
Standard LATE	Marginal across all units	None
Covariate-Adj.	Average over covariate distribution	Pure precision improvement
WLATE	Arbitrary nonnegative weights on cells	Customizable
CWLATE	Weights ∝ (first-stage jump) $^2$	Maximum compliance

6. Self-Selection and Discontinuities in Covariates

If covariates are not balanced at the cutoff (e.g., due to self-selection or sorting), simple additive adjustments are insufficient. Instead, inverse-probability weighting or cell-specific estimation must be adopted to restore valid causal interpretation. The resulting local Wald estimand can still be interpreted as the complier ATE weighted in a specified manner over the covariate distribution (Peng et al., 2019). Diagnostics for covariate balance, density continuity, and placebo tests are essential components of empirical implementation in such cases.

7. Software, Diagnostics, and Practical Recommendations

An extensive ecosystem supports the implementation of fuzzy RD with covariates, including:

rdrobust (R/Stata/Python): Automates local polynomial estimation, bias-correction, robust inference, and supports covariate inclusion and fuzzy designs (Cattaneo et al., 2023).
pyrdd, rdlocrand: Tools for local randomization checks, window selection, and exact inference.

Best practice recommendations include restricting covariate inclusion to predetermined, continuous-at-cutoff variables, conducting explicit covariate balance checks, reporting both adjusted and unadjusted estimates, and re-selecting bandwidth after adjusting for covariates. Additive covariates can be included for efficiency; interactions or cell-specific modeling are used only for heterogeneity analysis or for correcting non-continuities (Cattaneo et al., 2023, Cattaneo et al., 18 Jul 2025, Caetano et al., 1 Feb 2026).

Covariate-adjusted FRD estimators, whether via local linear, 2SLS, Bayesian GP, or weighted local regression, are now standard components of the causal inference toolkit in applied settings, particularly when compliance is incomplete and covariate structure is complex or predictive. Simulation and empirical evidence demonstrates improved stability, reduced variance, and sharper inference in such designs (Wu, 2021, Caetano et al., 1 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (5)

Leveraging Covariates in Regression Discontinuity Designs (2025)

A Practical Introduction to Regression Discontinuity Designs: Extensions (2023)

Regression Discontinuity Design under Self-selection (2019)

Hierarchical Gaussian Process Models for Regression Discontinuity/Kink under Sharp and Fuzzy Designs (2021)

Identification and Estimation in Fuzzy Regression Discontinuity Designs with Covariates (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fuzzy Regression Discontinuity Designs with Covariates.

Fuzzy Regression Discontinuity with Covariates

1. Definition and Conceptual Framework

2. Identification Assumptions

3. Estimation Methodologies

Local Linear/Polynomial and 2SLS Estimation

Gaussian Process and Deep Learning Extensions

Covariate Categorization and Weighted Estimands

4. Efficiency, Inference, and Bandwidth Selection

5. Heterogeneous Effects and Weighted LATEs

6. Self-Selection and Discontinuities in Covariates

7. Software, Diagnostics, and Practical Recommendations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Fuzzy Regression Discontinuity with Covariates

1. Definition and Conceptual Framework

2. Identification Assumptions

3. Estimation Methodologies

Local Linear/Polynomial and 2SLS Estimation

Gaussian Process and Deep Learning Extensions

Covariate Categorization and Weighted Estimands

4. Efficiency, Inference, and Bandwidth Selection

5. Heterogeneous Effects and Weighted LATEs

6. Self-Selection and Discontinuities in Covariates

7. Software, Diagnostics, and Practical Recommendations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research