Two-Way Fixed Effects Estimators
- Two-Way Fixed Effects estimators are panel data tools that leverage unit and time fixed effects to control for unobserved heterogeneity.
- They are widely applied in difference-in-differences studies but can produce biased estimates when treatment effects vary across units or time.
- Recent advancements offer robust alternatives to address issues like negative weights and forbidden comparisons, enhancing causal inference.
Two-Way Fixed Effects (TWFE) estimators are foundational tools in panel data econometrics, widely used for estimating causal treatment effects in the presence of unobserved heterogeneity. At their core, TWFE estimators leverage additive unit and time fixed effects to account for unobserved heterogeneity across both dimensions and are especially prevalent in difference-in-differences (DiD) and event-study designs with variation in treatment timing (“staggered adoption”). While conceptually simple and computationally convenient, recent methodological advances have revealed that the canonical TWFE estimators often exhibit substantial bias and misinterpretation risks when treatment effects are heterogeneous over time or across units. This has resulted in the widespread development of alternative, heterogeneity-robust estimators and diagnostic procedures to assess when the classical TWFE is appropriate.
1. Canonical Models and Scope
The canonical TWFE regression for panel data is given by
where is the outcome for unit in period , is a binary treatment indicator (or, more generally, a dosage variable), captures time-invariant unit heterogeneity, and captures common shocks or period fixed effects. The parameter of interest, , is interpreted as the average treatment effect (ATE) or average treatment effect on the treated (ATT) under strong identification assumptions.
Extensions commonly include time-varying covariates, non-binary treatments, multiple or dynamic treatments, and instrumental variables in a two-way fixed effects IV (TWFEIV) setup (Miyaji, 2024). The TWFE approach is also adapted for specialized designs, such as event studies with absorbing treatments or designs lacking true control groups but featuring “quasi-stayers” (Chaisemartin et al., 2024).
2. Identification Assumptions and Estimand Structure
Correct causal interpretation of TWFE estimators relies on several key assumptions:
- Parallel Trends (PT): In the absence of treatment, the average outcome for treated and untreated units (or not-yet-treated units, in staggered adoption designs) would evolve in parallel over time. Formally, for all , 0, and 1 (Rüttenauer et al., 2024).
- No Anticipation: Units do not adjust outcomes in advance of receiving treatment (Rüttenauer et al., 2024).
- No Time-Varying Omitted Confounders: Treatment assignment is as good as random, conditional on fixed effects.
Under these assumptions and treatment-effect homogeneity, TWFE estimates a variance-weighted ATT. In the sharp two-period, two-group case, the TWFE regression reduces to the classical DiD estimator (Lal, 7 Mar 2025):
2
However, with staggered treatment timing or time-varying effects, the TWFE estimand is a potentially non-convex, data-driven weighted average of group-time average treatment effects, with weights that may be negative (Chaisemartin et al., 2018, Chaisemartin et al., 2021).
3. Heterogeneity-Induced Pathologies and Diagnostics
When treatment effects are heterogeneous across units or time, the TWFE estimator generically produces a weighted sum,
3
where 4 is the group-time specific effect and weights 5 may be negative (Chaisemartin et al., 2018, Chaisemartin et al., 2021). Negative weights occur when already-treated units serve as controls for later-treated cohorts, leading to the following pathologies:
- Contamination Bias: The TWFE average can place negative weight on some true effects, potentially flipping the sign of 6 even if all 7.
- “Forbidden Comparisons”: TWFE implicitly leverages comparisons between treated and already-treated units, which are not valid DiD contrasts (Lal, 7 Mar 2025, Rüttenauer et al., 2024).
- Dynamic Setting Failures: In event-study regressions with lags/leads, TWFE coefficients on specific leads/lags do not isolate the causal effect at those horizons, but rather conflate them with contaminating effects from other periods (Sun et al., 2018, Chaisemartin et al., 2020).
Diagnostic tools for these pathologies include:
- Weight Diagnostics: Computation of the TWFE weights 8 to identify negative or large-magnitude weights (Jakiela, 2021).
- Homogeneity and Robustness Checks: Tests based on residualized outcomes and treatment residuals, subsample stability (“jackknife”), and sensitivity to dropping particular groups or time periods (Jakiela, 2021).
- Novel Wald-Type Tests: Wald 9-tests for homogeneity of dynamic effects or cohort effects in event-study models, implemented in packages such as
pyfixest(Lal, 7 Mar 2025).
4. Robust Estimation Strategies and Extensions
In response to the limitations of standard TWFE, multiple robust alternatives have been proposed:
- Interaction-Weighted (IW) Estimators: Cohort-event-time fully saturated models, aggregating cohort-specific event-time effects with convex weights to avoid contamination bias (Sun et al., 2018).
- Switchers-Based DIDs: Restricting attention to switching units and comparing only to stable (non-switching) units in their respective periods, ensuring non-negative weights and insulation from “forbidden” comparisons (Chaisemartin et al., 2018).
- Imputation and Augmented Methods: Regression adjustment or double-robust (AIPW) estimators that model both the assignment mechanism and the untreated potential outcome process, with consistency under correct specification of either component (Arkhangelsky et al., 2021, Caetano et al., 2024).
- Fused/Extended TWFE Methods: Machine-learning–guided fusion of cohort-specific effects to balance bias-variance in event-time heterogeneity, with proven selection consistency and oracle properties (Faletto, 2023).
- Two-Way Grouped FE Estimators: Group units and time-periods according to latent “types” and estimate group-time specific effects, accommodating arbitrary time-variation in unobserved heterogeneity and providing a practical non-iterative estimator (Pigini et al., 2023, Freeman et al., 2021).
For high-dimensional, sparse, or bipartite networks (e.g., worker–firm matched data), ridge-regularized TWFE restores stability and interpretable variance structure (He et al., 7 Jan 2026).
5. Design-Robustness, Inference, and Practical Recommendations
Design-robust TWFE estimators augment standard TWFE by inverse-propensity weights that arise from a model of the assignment mechanism (Arkhangelsky et al., 2021). When the assignment model and/or the outcome model is correctly specified, these estimators consistently recover a pre-specified average treatment effect, delivering double-robustness and efficiency gains particularly in the presence of staggered adoption or complex assignment patterns.
Key practical steps:
- Diagnostic First, Estimator Second: Always assess weight structure and conduct pre-trend tests or placebo checks before interpreting TWFE coefficients as causal effects (Chiu et al., 2023, Jakiela, 2021).
- Switch to Robust DIDs or Imputation-Based Approaches When Diagnostics Fail: In the presence of detected heterogeneity, or when event-study weights indicate substantial contamination/negativity, prefer robust estimators (Callaway–Sant’Anna, Sun–Abraham, de Chaisemartin–D’Haultfœuille, Borusyak–Jaravel–Spiess) (Chaisemartin et al., 2021, Rüttenauer et al., 2024).
- Event-Time Specification for Dynamics: Use cohort-dummy × event-time fully interactive specifications when interested in dynamic treatment effects, avoiding contaminated dynamic coefficients (Rüttenauer et al., 2024, Sun et al., 2018).
- Variance–Bias Tradeoff: Heterogeneity-robust estimators can exhibit substantially higher sampling variance, so, when diagnostics suggest homogeneity, conventional TWFE retains power advantage (Lal, 7 Mar 2025).
- Instrumental Variables Designs: In staggered DID-IV setups, the analogous contamination and weight pathologies occur unless both the reduced-form and first-stage effects are stable across groups and periods (Miyaji, 2024).
- Sparse or Unbalanced Designs: Use ridge-regularization or grouped FE to restore identification and control finite-sample bias/variance in extremely high-dimensional or sparse panels (He et al., 7 Jan 2026, Freeman et al., 2021).
6. Advanced Panel Structures and Generalized Heterogeneity
Modern research has extended TWFE and related estimators to accommodate:
- Unknown Functional Forms of Heterogeneity: Modeling the unobserved heterogeneity as a nonparametric or smooth bivariate function of latent unit and time effects, consistently estimated via a growing number of factors or grouped FE techniques (Freeman et al., 2021, Pigini et al., 2023).
- Variance-Weighted Estimands: Under general latent-factor models, the estimands are variance-weighted averages of unit-time–specific treatment effects, with weights proportional to the conditional variance of treatment after partialing out unobserved heterogeneity (Juodis et al., 20 Apr 2026).
- Multiple Treatments: In multi-treatment TWFE, coefficients are contaminated linear combinations of own- and other-treatment effects unless all effects are homogenous, and omitting a treatment can paradoxically reduce bias (Chaisemartin et al., 2020).
- No-Stayer/All-Treated Designs: In two-period designs with no untreated controls at post-period, TWFE consistency requires strong mean-independence assumptions; local-linear DIDs, partial-identified bounds, or parametric heterogeneity models are warranted otherwise (Chaisemartin et al., 2024).
7. Summary Table: TWFE and Main Robust Alternatives
| Estimator | Target estimand | Handles heterogeneity | Negative weights? | Requires never-treated controls | Applicability |
|---|---|---|---|---|---|
| Canonical TWFE | Weighted ATT (may be non-convex) | No | Yes | No | Simple designs, homogeneity |
| Sun–Abraham IW | Convex average of cohort-specific effects | Yes | No | Not-yet-/never-treated | Event-studies, staggered adoption |
| Callaway–Sant’Anna | Convex average of group-time ATTs | Yes | No | Not-yet-/never-treated | Staggered, multi-period panels |
| Switchers DID | Non-negative weighted avg of switching DIDs | Yes | No | Control group per period | General DiD, non-binary, reversals |
| Design-Robust TWFE | Pre-specified average under assignment model | Yes (doubly-robust) | Yes | No | Known/estimable assignment |
| Fused-ETWFE | Adaptive average w/ fused event-time effects | Yes (sparse) | No | No | Dynamic, local homogeneity |
References
- (Pigini et al., 2023) Specification testing with grouped fixed effects
- (Arkhangelsky et al., 2021) Design-Robust Two-Way-Fixed-Effects Regression For Panel Data
- (Sun et al., 2018) Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects
- (Lal, 7 Mar 2025) When can we get away with using the two-way fixed effects regression?
- (Juodis et al., 20 Apr 2026) Factor-Augmented Panel Regressions and Variance-Weighted Treatment Effects
- (Caetano et al., 2024) Difference-in-Differences when Parallel Trends Holds Conditional on Covariates
- (Jakiela, 2021) Simple Diagnostics for Two-Way Fixed Effects
- (Chaisemartin et al., 2018) Two-way fixed effects estimators with heterogeneous treatment effects
- (Chaisemartin et al., 2021) Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey
- (Chaisemartin et al., 2024) Two-way Fixed Effects and Differences-in-Differences Estimators in Heterogeneous Adoption Designs
- (Faletto, 2023) Fused Extended Two-Way Fixed Effects for Difference-in-Differences With Staggered Adoptions
- (Freeman et al., 2021) Linear Panel Regressions with Two-Way Unobserved Heterogeneity
- (Chaisemartin et al., 2020) Two-way Fixed Effects and Differences-in-Differences Estimators with Several Treatments
- (Chiu et al., 2023) Causal Panel Analysis under Parallel Trends
For full technical details and implementation references, consult the cited papers and their supplemental materials.