Average Treatment Effects on the Treated (ATT)

Updated 3 April 2026

ATT is defined as the mean difference between treated outcomes and their counterfactual untreated outcomes, key for policy and clinical evaluation.
Methodologies such as difference-in-differences and convex-hull partial identification adjust for selection bias and relax strict parallel trends assumptions.
Practical inference requires mapping empirical support and conducting sensitivity analyses to ensure robust and credible estimation of treatment effects.

The average treatment effect on the treated (ATT) is a central estimand in program evaluation, epidemiology, and policy analysis. It quantifies the mean difference between the observed outcome and the counterfactual untreated outcome among subjects who received the treatment. The ATT is particularly salient when the treated population is of policy or clinical interest, distinguishing it from the population average treatment effect (ATE). A rigorous understanding of the ATT encompasses its formal definition, identification under a variety of models, estimation under partial identification, efficiency considerations, and robust inference under model uncertainty.

1. Formal Definition and the Role of Selection Bias

In standard difference-in-differences (DID) and related frameworks, the ATT is defined as

$ATT \equiv \mathbb{E}[Y_1(1) - Y_1(0) \mid D=1],$

with $D$ indicating treatment status, $Y_t(d)$ denoting the potential outcome in period $t$ under treatment $d \in \{0,1\}$ , and $Y_t$ the observed outcome. The observed outcome in DID is encoded as $Y_0 = Y_0(0)$ , $Y_1 = D\cdot Y_1(1) + (1-D)\cdot Y_1(0)$ .

The natural estimator is the difference-in-means (or OLS estimator), which can be decomposed as

$\theta_{OLS} \equiv \mathbb{E}[Y_1 \mid D=1] - \mathbb{E}[Y_1 \mid D=0] = ATT + SB_1,$

where the selection bias in period $t$ is

$D$ 0

The parallel trends assumption (PT) posits $D$ 1, enabling identification as $D$ 2. The pre-treatment difference in means estimates $D$ 3 and is fundamental for justifying PT in DID applications (Ban et al., 2022).

2. Partial Identification and Robustification against Violations of Parallel Trends

Empirical skepticism regarding the PT assumption motivates more general frameworks for partial identification. One approach is to define a baseline information set $D$ 4 (e.g., pre-treatment periods or covariate strata), introduce conditional pre-period selection biases $D$ 5 for each $D$ 6, and posit the bias-set stability (convex hull) assumption: $D$ 7 From this, the sharp identified set for the ATT is

$D$ 8

This methodology only reverts to the standard DID point-estimate when the $D$ 9 coincide, i.e., when PT holds exactly in the pre-period (Ban et al., 2022).

Extensions to staggered adoption or multiple post-treatment periods are direct by imposing the convex-hull assumption on post-period biases over more general treatment paths, yielding analogous interval identification for path- or period-specific ATT parameters.

A sufficient condition for the convex-hull assumption is that $Y_t(d)$ 0 follows an interactive fixed effects (IFE) model with specific factor-structure symmetries, ensuring that the convex hull of pre-period biases actually contains the post-period bias (Ban et al., 2022).

3. Estimation and Inference under Partial Support and Support Diagnostics

The identification of the ATT further demands sufficient covariate support: for each value of $Y_t(d)$ 1 where treated units are observed, there must exist untreated units for valid counterfactual inference. Define the support set

$Y_t(d)$ 2

with $Y_t(d)$ 3 denoting discrete strata in the covariate space. The ATT is point-identified only on $Y_t(d)$ 4: $Y_t(d)$ 5 Let $Y_t(d)$ 6 denote the proportion of treated units within the empirical support. If $Y_t(d)$ 7, then standard estimators extrapolate beyond the data, failing to quantify the true causal effect for all treated units. Sensitivity analysis frameworks introduce curvature constraints on the selection mechanism, parameterized by $Y_t(d)$ 8, to generate identified sets for the ATT as a monotonic function of the assumption strength and support diagnostics (Li, 10 Jun 2025).

Table: Diagnostic Statistics for ATT Identification under Partial Support

Statistic	Description
$Y_t(d)$ 9	Proportion of treated units with support in the data
MAS-SI	Minimum assumption strength for sign identification
Fragility index	Minimal deviation required to overturn qualitative inference

A key implication is that the ATT is undefined outside $t$ 0 and credible estimation requires reporting only within this set or conducting transparent sensitivity analysis with explicit curvature assumptions. Standard estimators implicitly assume extrapolation and may be epistemically fragile in regions of low overlap (Li, 10 Jun 2025).

4. Inference Procedures and Confidence Sets

Construction of inferential statements about the ATT under the partial identification framework requires inversion of confidence intervals over all plausible values of the selection bias or sensitivity parameter (e.g., $t$ 1). For robust DID, individual confidence intervals for $t$ 2 are computed, and the overall interval is obtained by taking the pointwise minimum of lower ends and maximum of upper ends. By the Boole-Fréchet inequality, this interval contains the true ATT with at least the nominal coverage probability (Ban et al., 2022).

More generally, sensitivity bounds can be plotted as a function of $t$ 3 to visualize the trade-off between assumption strength and precision, exposing the fragility of inference to limited support or departures from ignorability. Reporting MAS-SI and the fragility index operationalizes the epistemic content of the estimate.

5. Model Structures, Sufficient Conditions, and Extensions

In the context of DID, sufficient conditions for the bias-set stability assumption and partial identification interval are provided by structural models such as IFE. Suppose untreated outcomes satisfy

$t$ 4

with specified independence and symmetry structure in $t$ 5. Under such models, pre- and post-treatment selection biases span the same convex set, and the convex-hull approach is sharp (Ban et al., 2022).

Further, this paradigm readily extends to multiple treatment periods, staggered adoption, and other generalized trend structures, via adjustment of the information sets and the associated observed biases.

6. Practical Implications and Applied Workflow

Robust ATT identification requires the empirical workflow:

Map the empirical support $t$ 6 and compute $t$ 7 before any estimation.
When $t$ 8, restrict estimation to $t$ 9 or conduct a sensitivity analysis indexed by curvature $d \in \{0,1\}$ 0.
Construct and interpret identified sets, report MAS-SI and the fragility index, and ensure inference is robust to support and modeling limitations.
Avoid black-box trimming or ad hoc extrapolation; instead, link all causal claims to the explicit support and the degree of unverifiable extrapolation required (Li, 10 Jun 2025).

By reframing PT and overlap not merely as regularity or nuisance conditions but as foundational to the very definition and identifiability of the ATT, these frameworks ensure that empirical claims are grounded in data structure and transparent about the strength and reach of their assumptions. This approach is central for credible causal analysis in both experimental and observational settings.

Markdown Report Issue Upgrade to Chat

References (2)

Robust Difference-in-differences Models (2022)

Fragility in Average Treatment Effect on the Treated under Limited Covariate Support (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Average Treatment Effects on the Treated (ATT).