Synthetic Difference-in-Differences (SDiD)

Updated 22 September 2025

Synthetic Difference-in-Differences is a causal inference estimator that combines DiD and Synthetic Control techniques, allowing for flexible weights and an intercept to address baseline differences.
It employs a regularized least squares approach with elastic net penalties (L1 and L2) to optimize weight selection and control overfitting in small sample settings.
The method enhances treatment effect estimation by modeling both matching pre-treatment trajectories and permanent additive differences, making it suitable for policy evaluation and labor market studies.

Synthetic Difference-in-Differences (SDiD) is a class of estimators that unite the foundational principles of Difference-in-Differences (DiD) and Synthetic Control (SC) methodologies for causal inference in panel data. SDiD aims to mitigate bias from violations of the parallel trends assumption by introducing flexible weighting and/or regularization, as well as intercept terms, thereby enhancing the robustness of treatment effect estimation in settings where standard approaches may be inadequate.

1. Theoretical Motivation and Estimator Formulation

The SDiD framework originates from the recognition of two key limitations in classical approaches:

DiD estimators are efficient and interpretable under the parallel trends assumption but do not account for potential imbalance or heterogeneity between treated and control units.
SC methods construct a weighted combination of control units to approximate the pre-treatment trajectory of the treated unit but traditionally require nonnegative weights summing to one and no intercept term, which precludes modeling permanent additive differences.

The general SDiD estimator relaxes these constraints. Let $Y_{0,T}(0)$ denote the counterfactual (untreated) outcome for the treated unit in post-treatment period $T$ . Classic SC imputation is

$\hat{Y}_{0,T}(0) = \sum_{i=1}^N \omega_i Y_{i,T}$

subject to $\omega_i \geq 0$ and $\sum_{i=1}^N \omega_i = 1$ , and with no intercept. By contrast, DiD uses fixed (often equal) weights but allows for an intercept to adjust for base-level differences: $\hat{Y}_{0,T}(0) = \mu + \sum_{i=1}^N \omega_i Y_{i,T}$ with $\mu$ unrestricted and typically $\omega_i = 1/N$ .

SDiD generalizes both by allowing arbitrary weights $\omega_i$ (which may be negative and need not sum to one) and an intercept $\mu$ : $\hat{Y}_{0,T}(0) = \mu + \sum_{i=1}^N \omega_i Y_{i,T}$ These parameters are chosen to minimize a regularized objective: $Q(\mu, \omega | Y; \lambda, \alpha) = \| Y - \mu - \omega^\top Y \|_2^2 + \lambda \cdot \left( \frac{1-\alpha}{2}\|\omega\|_2^2 + \alpha\|\omega\|_1 \right)$ Here, the first term enforces closeness to the treated unit's pre-treatment outcomes while the second term is an elastic net (L1 and L2) regularization to select and stabilize weights and control overfitting.

2. Methodological Implementation and Optimization

The estimation of $(\mu, \omega)$ is formulated as a regularized least squares problem. The elastic net penalty combines sparsity (L1) and shrinkage (L2):

L2 penalty ( $\|\omega\|_2^2$ ): Controls dispersion in the weights, limiting variance inflation.
L1 penalty ( $\|\omega\|_1$ ): Encourages sparse selection, assigning nonzero weight to a limited set of controls.

The tuning parameters $\lambda$ (overall regularization) and $\alpha$ (mixing between L1 and L2) are determined by cross-validation or other model selection criteria.

This optimization addresses identification problems that arise in modest pre-treatment periods relative to the number of controls, enabling operation in “small $N$ ” or “small $T$ ” settings. Allowing the sum of the weights to differ from one and negative values expands the flexibility of the synthetic comparison, potentially capturing outlier trends.

Table: Comparison of Weighting and Intercept Constraints

Method	Intercept ( $\mu$ )	Weights ( $\omega$ ) Constraints
Synthetic Control	No	Nonnegative, sum to one
DiD	Yes	Fixed (e.g. equal across controls)
SDiD (proposed)	Yes	Arbitrary (may sum $\neq$ one, can be negative)

3. Interpretational Implications and Theoretical Properties

The inclusion of a free intercept allows SDiD to model situations where the treated unit is persistently above or below the control group—a property analogous to group-level fixed effects in DiD. By constructing a synthetic control with non-restricted weights, SDiD improves fit to the pre-treatment trajectory and absorbs permanent additive differences through $\mu$ .

The estimator balances two objectives:

Matching pre-treatment outcomes: Ensuring that the synthetic control closely tracks the pre-intervention path of the treated unit.
Adjusting for level/structural differences: As in DiD, permanent differences are explicit in the model via the intercept.

When exactly matching pre-treatment trajectories is infeasible or could lead to ill-posedness (e.g., due to collinearity or limited $T_0$ ), regularization stabilizes estimation.

4. Robustness, Practical Considerations, and Empirical Applications

SDiD is robust to violations of classical parallel trends if the control units can be weighted to reproduce the treated unit’s pre-treatment trajectory (conditional on regularization strength and data support).

Relative to DiD:

Does not require all control units to carry equal weight; captures unit-level heterogeneity. Relative to SC:
Introduces intercepts to model persistent level differences, avoiding confounding of permanent “gaps” with treatment effects.
Admits negative and non-convex weights, which may be critical when the treated unit’s outcome is outside the convex hull of controls.

Applications are especially relevant for:

Policy interventions with a small number of units (e.g., a single state or country) or limited pre-treatment periods.
Contexts where the treated unit is distinct from controls.

Empirical examples include evaluations of California's anti-smoking policy, the reunification of West Germany, and labor market shocks like the Mariel Boatlift.

5. Regularization and Identification, Limitations, and Tuning

Regularization resolves identification challenges when the optimization problem is underdetermined or multicollinear. However, excessive regularization (large $\lambda$ ) can bias weights toward zero and diminish synthetic control fidelity, while too little regularization can amplify variance through unstable weights.

Potential limitations include:

The “extrapolative” risk of negative weights (especially when support between treated and controls does not overlap).
Interpretability may decrease as weights depart far from the simplex.

Tuning of $(\lambda, \alpha)$ is critical and can be performed using cross-validation on pre-treatment fit or alternative information criteria.

6. Concluding Remarks and Future Directions

SDiD combines the strengths of matching on pre-treatment outcomes (capturing complex treated-control dynamics) with accommodation of permanent differences (as in DiD). The use of elastic net regularization facilitates parsimonious weight selection and resolves overfitting risks. SDiD extends the scope of both DiD and SC to more varied empirical settings—offering a single unified estimator with broad applicability in causal panel data analysis (Doudchenko et al., 2016).

Future research includes integration of additional covariates, extensions to staggered treatment designs, and incorporation of modern regularization or machine learning-based adjustments to further enhance estimator stability and interpretability.

PDF Markdown Chat (Pro)

References (1)

Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis (2016)

Follow Topic

Get notified by email when new papers are published related to Synthetic Difference-in-Differences (SDiD).