Papers
Topics
Authors
Recent
Search
2000 character limit reached

Average Treatment Effects on the Treated (ATT)

Updated 3 April 2026
  • ATT is defined as the mean difference between treated outcomes and their counterfactual untreated outcomes, key for policy and clinical evaluation.
  • Methodologies such as difference-in-differences and convex-hull partial identification adjust for selection bias and relax strict parallel trends assumptions.
  • Practical inference requires mapping empirical support and conducting sensitivity analyses to ensure robust and credible estimation of treatment effects.

The average treatment effect on the treated (ATT) is a central estimand in program evaluation, epidemiology, and policy analysis. It quantifies the mean difference between the observed outcome and the counterfactual untreated outcome among subjects who received the treatment. The ATT is particularly salient when the treated population is of policy or clinical interest, distinguishing it from the population average treatment effect (ATE). A rigorous understanding of the ATT encompasses its formal definition, identification under a variety of models, estimation under partial identification, efficiency considerations, and robust inference under model uncertainty.

1. Formal Definition and the Role of Selection Bias

In standard difference-in-differences (DID) and related frameworks, the ATT is defined as

ATT≡E[Y1(1)−Y1(0)∣D=1],ATT \equiv \mathbb{E}[Y_1(1) - Y_1(0) \mid D=1],

with DD indicating treatment status, Yt(d)Y_t(d) denoting the potential outcome in period tt under treatment d∈{0,1}d \in \{0,1\}, and YtY_t the observed outcome. The observed outcome in DID is encoded as Y0=Y0(0)Y_0 = Y_0(0), Y1=D⋅Y1(1)+(1−D)⋅Y1(0)Y_1 = D\cdot Y_1(1) + (1-D)\cdot Y_1(0).

The natural estimator is the difference-in-means (or OLS estimator), which can be decomposed as

θOLS≡E[Y1∣D=1]−E[Y1∣D=0]=ATT+SB1,\theta_{OLS} \equiv \mathbb{E}[Y_1 \mid D=1] - \mathbb{E}[Y_1 \mid D=0] = ATT + SB_1,

where the selection bias in period tt is

DD0

The parallel trends assumption (PT) posits DD1, enabling identification as DD2. The pre-treatment difference in means estimates DD3 and is fundamental for justifying PT in DID applications (Ban et al., 2022).

Empirical skepticism regarding the PT assumption motivates more general frameworks for partial identification. One approach is to define a baseline information set DD4 (e.g., pre-treatment periods or covariate strata), introduce conditional pre-period selection biases DD5 for each DD6, and posit the bias-set stability (convex hull) assumption: DD7 From this, the sharp identified set for the ATT is

DD8

This methodology only reverts to the standard DID point-estimate when the DD9 coincide, i.e., when PT holds exactly in the pre-period (Ban et al., 2022).

Extensions to staggered adoption or multiple post-treatment periods are direct by imposing the convex-hull assumption on post-period biases over more general treatment paths, yielding analogous interval identification for path- or period-specific ATT parameters.

A sufficient condition for the convex-hull assumption is that Yt(d)Y_t(d)0 follows an interactive fixed effects (IFE) model with specific factor-structure symmetries, ensuring that the convex hull of pre-period biases actually contains the post-period bias (Ban et al., 2022).

3. Estimation and Inference under Partial Support and Support Diagnostics

The identification of the ATT further demands sufficient covariate support: for each value of Yt(d)Y_t(d)1 where treated units are observed, there must exist untreated units for valid counterfactual inference. Define the support set

Yt(d)Y_t(d)2

with Yt(d)Y_t(d)3 denoting discrete strata in the covariate space. The ATT is point-identified only on Yt(d)Y_t(d)4: Yt(d)Y_t(d)5 Let Yt(d)Y_t(d)6 denote the proportion of treated units within the empirical support. If Yt(d)Y_t(d)7, then standard estimators extrapolate beyond the data, failing to quantify the true causal effect for all treated units. Sensitivity analysis frameworks introduce curvature constraints on the selection mechanism, parameterized by Yt(d)Y_t(d)8, to generate identified sets for the ATT as a monotonic function of the assumption strength and support diagnostics (Li, 10 Jun 2025).

Table: Diagnostic Statistics for ATT Identification under Partial Support

Statistic Description
Yt(d)Y_t(d)9 Proportion of treated units with support in the data
MAS-SI Minimum assumption strength for sign identification
Fragility index Minimal deviation required to overturn qualitative inference

A key implication is that the ATT is undefined outside tt0 and credible estimation requires reporting only within this set or conducting transparent sensitivity analysis with explicit curvature assumptions. Standard estimators implicitly assume extrapolation and may be epistemically fragile in regions of low overlap (Li, 10 Jun 2025).

4. Inference Procedures and Confidence Sets

Construction of inferential statements about the ATT under the partial identification framework requires inversion of confidence intervals over all plausible values of the selection bias or sensitivity parameter (e.g., tt1). For robust DID, individual confidence intervals for tt2 are computed, and the overall interval is obtained by taking the pointwise minimum of lower ends and maximum of upper ends. By the Boole-Fréchet inequality, this interval contains the true ATT with at least the nominal coverage probability (Ban et al., 2022).

More generally, sensitivity bounds can be plotted as a function of tt3 to visualize the trade-off between assumption strength and precision, exposing the fragility of inference to limited support or departures from ignorability. Reporting MAS-SI and the fragility index operationalizes the epistemic content of the estimate.

5. Model Structures, Sufficient Conditions, and Extensions

In the context of DID, sufficient conditions for the bias-set stability assumption and partial identification interval are provided by structural models such as IFE. Suppose untreated outcomes satisfy

tt4

with specified independence and symmetry structure in tt5. Under such models, pre- and post-treatment selection biases span the same convex set, and the convex-hull approach is sharp (Ban et al., 2022).

Further, this paradigm readily extends to multiple treatment periods, staggered adoption, and other generalized trend structures, via adjustment of the information sets and the associated observed biases.

6. Practical Implications and Applied Workflow

Robust ATT identification requires the empirical workflow:

  • Map the empirical support tt6 and compute tt7 before any estimation.
  • When tt8, restrict estimation to tt9 or conduct a sensitivity analysis indexed by curvature d∈{0,1}d \in \{0,1\}0.
  • Construct and interpret identified sets, report MAS-SI and the fragility index, and ensure inference is robust to support and modeling limitations.
  • Avoid black-box trimming or ad hoc extrapolation; instead, link all causal claims to the explicit support and the degree of unverifiable extrapolation required (Li, 10 Jun 2025).

By reframing PT and overlap not merely as regularity or nuisance conditions but as foundational to the very definition and identifiability of the ATT, these frameworks ensure that empirical claims are grounded in data structure and transparent about the strength and reach of their assumptions. This approach is central for credible causal analysis in both experimental and observational settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Average Treatment Effects on the Treated (ATT).