Difference-in-Differences with Sample Selection

Published 14 Nov 2024 in econ.EM | (2411.09221v2)

Abstract: We consider identification of average treatment effects on the treated (ATT) within the difference-in-differences (DiD) framework in the presence of endogenous sample selection. First, we establish that the usual DiD estimand fails to recover meaningful treatment effects, even if selection and treatment assignment are independent. Next, we partially identify the ATT for individuals who are always observed post-treatment regardless of their treatment status, and derive bounds on this parameter under different sets of assumptions about the relationship between sample selection and treatment assignment. Extensions to the repeated cross-section and two-by-two comparisons in the staggered adoption case are explored. Furthermore, we provide identification results for the ATT of three additional empirically relevant latent groups by incorporating outcome mean dominance assumptions which have intuitive appeal in applications. Finally, two empirical illustrations demonstrate the approach's usefulness by revisiting (i) the effect of a job training program on earnings(Calonico & Smith, 2017) and (ii) the effect of a working-from-home policy on employee performance (Bloom, Liang, Roberts, & Ying, 2015).

Abstract PDF HTML Upgrade to Chat

Authors (4)

Summary

The paper addresses bias in Difference-in-Differences caused by endogenous sample selection, proposing methods for partial identification of treatment effects.
It introduces a novel partial identification strategy without monotonicity assumptions, utilizing parallel trends assumptions for sample selection.
The methodology incorporates monotonicity assumptions to derive tighter bounds and extends identification to diverse latent groups beyond the always-observed.

Difference-in-Differences with Sample Selection: A Methodological Exploration

The paper "Difference-in-Differences with Sample Selection" by Rathnayake et al. addresses a significant issue in empirical research: the presence of endogenous sample selection within the difference-in-differences (DiD) framework. The authors explore the implications of non-random sample selection, which can severely bias causal inferences if not appropriately addressed. The study presents a robust methodological approach, offering partial identification of treatment effects by investigating various latent groups within the treated population.

In empirical studies, sample selection poses a challenge when the outcome of interest is accessible only for a selective subset of the population, potentially causing bias in standard DiD estimations. This paper tackles this issue by clarifying that naive DiD estimators can be biased for actual treatment effects even when treatment assignment is independent of the selection mechanism. Through its theoretical and empirical analyses, the paper advances the understanding of DiD estimators' limitations under non-random selection and provides practical strategies for researchers facing similar issues.

The authors derive bounds for the average treatment effect on the treated (ATT), specifically focusing on those individuals who remain observable post-treatment regardless of their treatment status—referred to as the "always-observed" group. This partial identification strategy builds on identifying these latent subpopulations under various assumptions concerning the relationship between sample selection and treatment assignment.

Three primary methodological contributions stand out:

Partial Identification without Monotonicity: Without assuming monotonicity in selection, the paper innovatively integrates parallel trends assumptions for sample selection to derive bounds for the always-observed group. This approach acknowledges that differences in enumerated trends of sample selection post-treatment can exist between treated and control groups.
Incorporating Monotonicity in Sample Selection: A crucial part of the analysis involves monotonicity assumptions regarding sample selection, which allow for tighter bounds. By assuming that treatment does not adversely affect the probability of selection, the bounds become narrower, making the estimated effects more informative.
Identifying ATT for Diverse Latent Groups: Beyond the always-observed group, the methodology extends to other latent groups, such as those only observable when treated or those with varying observability based on treatment scenarios. The application of outcome mean dominance assumptions enables more insightful bounds on these groups' ATTs.

In practical terms, the paper's implications are significant. Researchers aiming to apply DiD methods in the presence of non-random sample selection can employ these new techniques to mitigate bias. The empirical illustrations—concerning a job training program and a work-from-home policy—demonstrate the approach's applicability across contexts with distinct selection biases.

Looking forward, the methodology developed here offers pathways for future research, particularly in expanding the range of assumptions that can be relaxed or modified when exploring partial identification with DiD methods. This work not only underscores the importance of considering selection bias in evaluation studies but also provides concrete steps for addressing it, thus enriching the methodological toolkit available to econometricians and applied researchers alike. As AI continues to evolve, deploying such rigorous approaches to causal inference in machine learning settings—where selection issues are prevalent—could also be an avenue worth exploring.

Markdown Report Issue