Weighted Average Treatment Effect (WATE)

Updated 19 September 2025

WATE is defined as E[h(X){Y(1) - Y(0)}]/E[h(X)], generalizing ATE by weighting covariate profiles to target specific populations.
Estimation strategies include outcome regression, IPW, and AIPW, with the latter offering double robustness and efficiency gains.
WATE methods bolster both internal and external validity in observational studies, with proven benefits in comparative effectiveness research.

The weighted average treatment effect (WATE) is a class of causal effect estimands that quantifies the average causal effect of an intervention when applied across a target population that is specified by a pre-defined or data-driven weighting function. WATE generalizes the traditional average treatment effect (ATE) by introducing flexibility in the choice of the target population, facilitating more relevant and nuanced policy or scientific inference. The ability to formally target a specific population with arbitrary covariate distribution, potentially distinct from the observed sample, positions WATE as foundational in contemporary causal inference for observational studies, comparative effectiveness research, and the evaluation of interventions under covariate shift.

1. Formal Definition and Role of the Target Function

WATE is defined as

$\tau_h = \frac{E[h(X)\{Y(1) - Y(0)\}]}{E[h(X)]}$

where $h(X)$ is a nonnegative function specifying the emphasis or relevance of each covariate profile $X$ in the target population, and $Y(1), Y(0)$ are the potential outcomes under treatment and control, respectively. By choosing $h(X)$ appropriately, one can recover many standard estimands:

For ATE: $h(X) \equiv 1$ .
For ATT: $h(X) = \pi(X)$ , where $\pi(X)$ is the propensity score.
For ATC: $h(X) = 1 - \pi(X)$ .

The estimator directly targets E[h(X){Y(1)-Y(0)}]/E[h(X)], maintaining both internal validity (removal of confounding) and external validity (population representativeness with respect to $h$ ). In settings where the sample distribution differs from the policy-relevant population, WATE estimators facilitate answering causal questions that are tailored to a distribution of covariates other than the empirical sample itself (Tao et al., 2018).

2. Estimation Strategies for WATE

Three primary classes of estimators for WATE are considered:

(a) Outcome Regression Estimator: $\hat{\tau}_h^R = \frac{\sum_{i=1}^n h(X_i, \hat{\gamma})\{\hat{m}_1(X_i, \hat{\beta}) - \hat{m}_0(X_i, \hat{\beta})\}}{\sum_{i=1}^n h(X_i, \hat{\gamma})}$ where $\hat{m}_a(X, \hat{\beta})$ is a fitted outcome model for arm $A=a$ . This approach relies on correct specification of the regression models.

(b) Inverse Probability Weighting (IPW) Estimator:

With estimated propensity score $\hat{\pi}(X, \hat{\alpha})$ , the weights are: $\begin{aligned} w_1(X) &= h(X)/\pi(X), \qquad \text{for } A=1 \ w_0(X) &= h(X)/(1 - \pi(X)), \qquad \text{for } A=0 \ \end{aligned}$ The WATE is then estimated by

$\hat{\tau}_h^I = \frac{\sum_{i: A_i=1} Y_i w_1(X_i)}{\sum_{i: A_i=1} w_1(X_i)} - \frac{\sum_{i: A_i=0} Y_i w_0(X_i)}{\sum_{i: A_i=0} w_0(X_i)}$

IPW is sensitive to mis-specification of $\pi(X)$ or practical violations of the positivity assumption, often manifesting as extreme weights when $h(X)$ and $\pi(X)$ are both small.

(c) Augmented Inverse Probability Weighting (AIPW) Estimator:

When $h(X)$ is known or correctly specified,

$\begin{aligned} \hat{\tau}_h^A &= \frac{1}{\sum_{i} h(X_i, \hat{\gamma})} \sum_{i} h(X_i, \hat{\gamma}) \Bigg\{ \bigg[\frac{A_i Y_i}{\pi(X_i, \hat{\alpha})} - \frac{A_i - \pi(X_i, \hat{\alpha})}{\pi(X_i, \hat{\alpha})} m_1(X_i, \hat{\beta})\bigg] \ &\qquad - \bigg[\frac{(1-A_i) Y_i}{1-\pi(X_i, \hat{\alpha})} + \frac{A_i - \pi(X_i, \hat{\alpha})}{1-\pi(X_i, \hat{\alpha})} m_0(X_i, \hat{\beta})\bigg] \Bigg\} \end{aligned}$

This estimator is doubly robust—it is consistent if either the propensity score model or the outcome regression model is correctly specified, and achieves improved efficiency when both models are correct (Tao et al., 2018).

3. Target Functions Depending on the Propensity Score: DR Estimation for ATT/ATC

When $h(X)$ is a function of the (possibly misspecified) propensity score—for instance, $h(X) = \pi(X)$ (ATT) or $h(X) = 1 - \pi(X)$ (ATC)—the classical double robustness of the AIPW estimator does not hold universally, because errors in $\pi(X)$ directly enter both the weights and the definition of the target population. The paper proposes estimator forms tailored for these cases:

For instance, for ATT,

$\hat{\tau}_\pi^\mathrm{DR} = \frac{1}{\sum_i A_i} \sum_{i=1}^n \Big[ A_i Y_i - \big\{ \frac{\pi(X_i)(1-A_i)}{1-\pi(X_i)} Y_i + \frac{A_i - \pi(X_i)}{1-\pi(X_i)} m_0(X_i, \hat{\beta}) \big\} \Big]$

A parallel form is provided for ATC. Both fit into a unified general estimator when $h(X) = a + b\pi(X)$ as required for linear target functions. Theorem 2 of (Tao et al., 2018) proves that these estimators recover the desired $\tau_h$ consistently if either the propensity score model or the outcome regression model is correctly specified, thus restoring a form of double robustness for ATT and ATC.

4. Theoretical Properties: Double Robustness and Efficiency

Key theoretical properties of the AIPW and doubly robust estimators developed in (Tao et al., 2018):

Double Robustness: For general $h(X)$ , if $h$ is known and either the outcome model or the propensity score is correctly specified, $\hat{\tau}_h^A$ is consistent. When $h(X)$ is affine in $\pi(X)$ (as for ATT/ATC), the DR estimators maintain consistency given correct specification of either nuisance model.
Efficiency: When both models are correct, AIPW and DR estimators have strictly smaller asymptotic variance compared to IPW, because they incorporate outcome model information stabilizing the estimation under heteroskedasticity and rare covariate patterns.
Finite Sample Performance: In simulation studies, the AIPW and DR estimators exhibit smaller bias and root mean squared error than IPW methods for a broad range of scenarios, including when $h(\cdot)$ is misspecified (e.g., overlap weights).

These results emphasize the practical advantage of augmented estimators, both in terms of bias and efficiency, under partial or uncertain model specification (Tao et al., 2018).

5. Empirical Illustration in Comparative Effectiveness Research

A substantive application in (Tao et al., 2018) uses UK Clinical Practice Research Datalink (CPRD) data to compare GLP-1 RA and insulin therapy among type 2 diabetes patients, focusing on reduction in hemoglobin A1c after six months. Major implementation points:

Notable differences in age and BMI between treatment groups necessitate adjustment for observed confounding.
Logistic regression (for the propensity score) and linear regression with interactions (for the outcome) were fitted to estimate the required nuisance parameters.
Estimands including the ATE, ATT, and ATC were approximated using the above WATE estimators.
Doubly robust estimators produced treatment effect estimates with reduced absolute magnitude and wider confidence intervals compared to naive analyses; the conclusion was that—after robust covariate adjustment—glycemic control did not substantially differ, although secondary endpoints favored GLP-1 RA.

This underscores WATE methods' value for both internal validity (adjusting for observed confounders) and external validity (targeting a policy- or clinically-relevant population).

6. Simulation Evidence and Practical Considerations

Simulation studies in (Tao et al., 2018) systematically assess estimator performance under various patterns of model misspecification and propensity score distribution, including extreme values and loss of overlap. Key findings include:

AIPW and DR estimators show uniform bias reduction and lower RMSE compared to unaugmented methods.
Variance reduction is more pronounced as overlap decreases or as weighting functions $h(X)$ place mass on regions of covariate support with moderate to high density for both treatment groups.
When both nuisance models are misspecified or $h(\cdot)$ is outside the class considered, augmented estimators still outperform standard IPW.
Practical implementation requires robust functional forms for model components and diagnostic checking of overlap and adequacy of $h(\cdot)$ specification.

7. Summary and Impact

The integrated WATE estimation framework in (Tao et al., 2018) advances the theory and practice of causal inference by introducing robust, doubly robust estimators for arbitrary weighted populations of scientific, clinical, and policy interest. By connecting the functional form of the weights to existing estimands (ATE, ATT, ATC, overlap), the estimators offer both robustness to model misspecification and improved efficiency, reinforced by simulation evidence and substantive application. The methodological contribution centers on the formalization of double robustness properties, the explicit construction of general-purpose estimators for any $h(X)$ , and the deeper operational analysis of efficiency tradeoffs for variants including outcome regression, IPW, and augmented approaches.

The methods in this paper are now foundational and widely referenced in the context of generalizing or transporting causal effect estimates beyond the observed sample, as well as in situations with complex or policy-motivated target populations in observational research.

PDF Markdown Chat (Pro)

References (1)

Robust Estimation of the Weighted Average Treatment Effect for A Target Population (2018)

Follow Topic

Get notified by email when new papers are published related to Weighted Average Treatment Effect (WATE).