Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

Non-Separable Two-Way Fixed Effects (NSTW)

Updated 28 July 2025
  • Non-Separable Two-Way Fixed Effects (NSTW) models are advanced econometric frameworks that accommodate nonlinear and non-additive latent unit and time effects to address unobserved heterogeneity in panel and network data.
  • Estimation methods include approximate factor techniques using PCA-based decompositions and grouped fixed effects that discretize latent heterogeneity, balancing approximation bias and estimation variance.
  • NSTW models provide robust identification and inference in complex settings such as matched employer‐employee and teacher-student data by leveraging network connectivity and diagnostic tools.

Non-Separable Two-Way Fixed Effects (NSTW) models generalize the classical additive and interactive fixed effects frameworks in panel and network data analysis, enabling researchers to accommodate more general forms of unobserved heterogeneity. This conceptual expansion reflects both recent econometric theory and empirical practice. NSTW models subsume standard two-way fixed effects (TWFE) and interactive fixed effects (IFE) as special, more restrictive cases, admitting much richer (potentially nonlinear and non-additive) relationships between latent unit and time effects. The primary motivation stems from documented failures of separability and homogeneity in applied work, especially in matched employer-employee datasets, student-teacher learning production studies, and multi-dimensional policy evaluation contexts.

1. Model Definition and Network Representation

Canonical two-way fixed effects models (TWFE) are written as

y=B1μ+B2η+Xβ+uy = B_1 \mu + B_2 \eta + X\beta + u

where μ\mu and η\eta are group- or dimension-specific fixed effects (e.g., teacher/firm and student/worker), B1B_1 and B2B_2 are incidence matrices, and XX denotes observed covariates. This model is separable: it models unit and time (or other) effects as additive linear terms.

NSTW models, in contrast, allow the unobserved effect to enter as a general, possibly nonlinear and non-additive, function of latent unit and time-specific components,

cit=h(zi,ft)c_{it} = h(z_i, f_t)

where hh is any unknown smooth function, ziz_i are (possibly multidimensional) latent unit characteristics, and ftf_t are latent time-varying factors (Ditzen et al., 25 Jul 2025). Notably, if h(zi,ft)h(z_i, f_t) is bilinear (e.g., ziftz_i'f_t), the model reduces to the standard IFE case.

For bipartite or matched data, as in worker-firm or teacher-student networks, the two-way model can be reframed as a network: each observation represents an "edge" connecting two vertices (e.g., worker ii and firm jj); the corresponding regression is

y=Bα+Xβ+u,αi=μi, αj=ηjy = B\alpha + X\beta + u, \quad \alpha_i = \mu_i,~ \alpha_j = -\eta_j

with BB the concatenation of B1B_1 and B2-B_2, embedding the model within a network context where estimation, identification, and inference rely on graph-theoretic properties (Jochmans et al., 2016).

2. Identification: Separability, Network Structure, and Harmonic Connectivity

NSTW models are identified under less restrictive conditions than TWFE or IFE, leveraging both within-network connectivity and smoothness of hh. In the network representation, identification depends on the connectedness of the underlying bipartite graph. Specifically:

  • Only differences such as μiηj\mu_i - \eta_j are identified, implying the necessity of normalization constraints (e.g., dα=0d'\alpha = 0, with dd the vector of (weighted) vertex degrees).
  • The variance of estimator components, var(α^i)\operatorname{var}(\hat\alpha_i), is governed by both local degree did_i and global spectral connectivity (λ2\lambda_2—the second-smallest eigenvalue of the normalized Laplacian) and local neighbor degree harmonic means (hih_i):

var(α^i)σ2di(1+1λ2hi)2σ2m\operatorname{var}(\hat\alpha_i) \le \frac{\sigma^2}{d_i}\left(1 + \frac{1}{\lambda_2 h_i}\right) - \frac{2\sigma^2}{m}

with equality in special cases (Jochmans et al., 2016). Strong identification requires λ2hi\lambda_2 h_i \to \infty (as nn \to \infty), reflecting high local and global connectivity.

For functional NSTW models,

yit=xitβ+h(zi,ft)+ϵity_{it} = x_{it}'\beta + h(z_i, f_t) + \epsilon_{it}

injectivity of first-moment statistics is necessary: variation in observed averages or clustering proxies must span the latent latent types (Ditzen et al., 25 Jul 2025).

3. Estimation Methods for NSTW

Due to the infinite-dimensional and nonparametric nature of h(,)h(\cdot,\cdot), estimation approaches typically employ one of two broad strategies:

A. Approximate Factor Estimation (ILS, PCA-based)

Drawing on the singular value decomposition of hh, one expands:

h(zi,ft)=r=1σrϕr(zi)ψr(ft)h(z_i, f_t) = \sum_{r=1}^\infty \sigma_r \phi_r(z_i) \psi_r(f_t)

and then approximates with RR components, fitting

(β^,λ^,f^)=argminβ,λ,fit[yitxitβr=1Rλirftr]2(\hat\beta, \hat\lambda, \hat f) = \arg\min_{\beta, \lambda, f} \sum_{it} [y_{it} - x_{it}'\beta - \sum_{r=1}^R \lambda_{ir} f_{tr}]^2

As RR \to \infty appropriately with N,TN, T, this method can recover the NSTW structure up to an approximation error declining with RR, at the cost of increased estimation variance (Freeman et al., 2021). The optimal rate of RR trades off approximation and estimation error, often yielding min(N,T)\sqrt{\min(N,T)}-consistency.

B. Grouped Fixed Effects (Clustering-based Discretization)

Alternatively, researchers discretize the latent heterogeneity by clustering units and times into GG and CC clusters, respectively, then estimate

yit=xitβ+cgi,lt+ϵity_{it} = x_{it}'\beta + c_{g_i, l_t} + \epsilon_{it}

where gig_i and ltl_t denote the groupings. The two-step grouped fixed effects (TSGF) estimator involves a first-stage clustering (e.g., k-means or hierarchical) then fixed effects regression on group pairs (Ditzen et al., 25 Jul 2025, Freeman et al., 2021). Split-sample implementations may avoid overfitting bias. As G,CG,C \to \infty with sample size, the nonseparable function hh can be approximated arbitrarily closely, but an empirical bias-variance tradeoff emerges based on the smoothness of hh and the number of groups.

Relevant error expansions for these procedures are:

β^=β+H1(N1i=1Nsi)+Op(1T+1N+GCNT)+Op(G2/K+C2/K)+op(NT1/2)\hat\beta = \beta + H^{-1}(N^{-1}\sum_{i=1}^N s_i) + O_p\left(\frac{1}{T} + \frac{1}{N} + \frac{GC}{NT}\right) + O_p\left(G^{-2/K} + C^{-2/K}\right) + o_p(NT^{-1/2})

where HH is the Hessian, sis_i are influence components, and KK is the dimension of the latent features.

C. Minimal Bridge Function (Moment Equations for Counterfactual Identification)

For identifying average causal effects in NSTW/factor models, bridge functions h(Ypre,X;θ)h(Y_{\text{pre}}, X; \theta^*) are constructed to remove latent confounding via balancing moments:

E[Y0(0)h(Ypre,X;θ)U,A=0,X]=0\mathbb{E}[Y_0(0) - h(Y_{\text{pre}}, X; \theta^*) | U, A=0, X] = 0

The minimal bridge function is chosen via regularized GMM to minimize θ2\| \theta \|_2 under the identifying restrictions, yielding root-NN consistency for average effects even with fixed TT (Imbens et al., 2021).

4. Inference and Asymptotic Properties

Achieving valid inference for β\beta in NSTW models requires accounting for both the latent structure and possible bias from group discretization or weak factor approximation. When clustering/grouping adequately captures the heterogeneity, the TSGF or TSGF-M estimator is asymptotically normal and NT\sqrt{NT}-consistent.

Diagnostic tools—Pesaran-type cross-sectional dependence (CD, CDw, CD*) tests, and factor number selection (Eigenvalue Ratio (ER), Growth Rate (GR), GOS)—aid in verifying that the estimator has adequately absorbed common latent components (Ditzen et al., 25 Jul 2025). In cluster-based approaches, approximation error rates depend on the group counts and the smoothness of hh.

Variance formulas for estimators of fixed effect components must account for network connectivity. For network/graph-based NSTW models, the variance of individual fixed effects is dominated by local degree and global connectivity (e.g., the eigenvalue λ2\lambda_2 and harmonic neighbor mean hih_i in the normalized Laplacian) (Jochmans et al., 2016).

5. Empirical Illustration and Comparative Findings

The practical impact of NSTW modeling is evident in empirical studies:

  • In teacher value-added models, sparse bipartite graphs (few teachers per student, little mixing) yield ill-identified effects: the true variance is underestimated by standard approximations by a factor of 2.5, and confidence intervals for teacher effects are greatly over-optimistic when using TWFE-type variance estimates (Jochmans et al., 2016).
  • Occupational wage decompositions with dense worker-occupation networks exhibit strong global connectivity, ensuring that traditional fixed-effects methods and their variance estimates are nearly accurate.
  • In the growth–inflation and Feldstein–Horioka puzzles, residual cross-section dependence is eliminated only when high-dimensional IFE or TSGF-M estimators are used; conventional FE estimators leave persistent structure in residuals (Ditzen et al., 25 Jul 2025).
  • Simulation studies confirm that as the number of factors or clusters increases, both the bias and variance of β^\hat\beta improve; yet, over-parameterization may elevate variance if clustering is too granular relative to sample size (Freeman et al., 2021).

6. Diagnostics, Specification Testing, and Model Assessment

Validity of the NSTW approach must be empirically verified. Specification tests compare restricted models (TWFE/IFE) to grouped fixed effects estimators:

  • A bootstrap generalized Hausman test contrasts the restricted estimator with a more flexible GFE estimator, with critical values adjusted to account for approximation biases and the incidental parameter problem, yielding asymptotically correct size and good power (Pigini et al., 2023).
  • Standard residual cross-section dependence diagnostics (CD, CDw, CD*) and eigenvalue-based methods for factor number estimation are central to practice, ensuring that latent heterogeneity is adequately captured.

Model selection—balancing overfitting and bias—is guided by these diagnostics, with clustering counts (G,C) or the number of factors optimally chosen to minimize finite-sample mean squared error given the presumed smoothness of hh.

7. Implications for Network Data, Dense vs. Sparse Regimes, and Generalizations

NSTW methods clarify that the precision of fixed effect estimates in network data is controlled not simply by the total sample size but by both local and global connectivity:

  • In dense networks with high degrees and global mixing (λ2\lambda_2 bounded away from zero), parametric rates of convergence (e.g., 1/di1/\sqrt{d_i} for fixed effect variance) and normality are achieved.
  • Sparse networks with weak connectivity face enlarged variances, and the effective sample size for estimating an effect is essentially its local degree.
  • In practical NSTW estimation, attention must be paid to the network’s structural properties to avoid underestimating uncertainty in fixed effect (e.g., teacher, firm) estimands.

When the NSTW framework is neglected in favor of restrictive TWFE or IFE models, bias and invalid inference are likely, particularly in the presence of nonlinear or interactive omitted variable structures, weak connectivity, or substantial latent group-time interaction heterogeneity. Proper modeling and diagnosis, using grouped fixed effects, high-rank factor approximations, and network analysis, enable robust identification and inference in modern panel and network datasets.