Causal Forests with Fixed Effects

Updated 26 February 2026

Causal forests are nonparametric ensemble methods that estimate heterogeneous treatment effects by capturing local variations and controlling for fixed effects.
The fixed-effects extension (CFFE) uses node-level residualization and cluster-aware subsampling to robustly handle confounding in panel data.
Empirical studies show that CFFE reduces bias and mean-squared error compared to standard causal forests, ensuring more reliable inference.

Causal forests are nonparametric ensemble methods for estimating heterogeneous treatment effects, particularly the conditional average treatment effect (CATE) $\tau(x) = \mathbb{E}[Y(1)-Y(0)|X=x]$ , under unconfoundedness and overlap. In standard cross-sectional data, causal forests employ recursive partitioning and aggregation to capture local structure in treatment effect heterogeneity, with theoretical guarantees for consistency and valid inference. However, when applied to panel data settings with unit and time fixed effects, standard causal forests are susceptible to spurious heterogeneity induced by nonparametric confounding, such as $\alpha_i$ (unit effects) and $\gamma_t$ (time effects). Causal Forests with Fixed Effects (CFFE) introduce a principled extension that achieves robust estimation of heterogeneous treatment effects in such panel data, integrating node-level fixed effect residualization, cluster-aware subsampling, and an adapted split criterion (Aytug, 15 Jan 2026).

1. Panel Data Model and Identification

CFFE targets the panel-data setting: $Y_{it} = \alpha_i + \gamma_t + \tau(X_{it}) D_{it} + \varepsilon_{it}, \qquad \mathbb{E}[\varepsilon_{it}|X_{it},D_{it},\alpha_i,\gamma_t]=0$ with $i=1,\dots,N$ units and $t=1,\dots,T$ time periods, binary treatment $D_{it}\in\{0,1\}$ , and covariates $X_{it}\in\mathbb{R}^p$ . The estimand is the CATE $\tau(x)=\mathbb{E}[Y_{it}(1)-Y_{it}(0)|X_{it}=x]$ .

Panel fixed effects $\alpha_i$ and $\gamma_t$ act as perfectly collinear predictors at the global level. Their omission from forests or global residualization prior to tree growth induces spurious heterogeneous effect estimates, confounding the recovery of true CATE structure.

2. Node-Level Residualization

The methodological core of CFFE is node-level fixed effect removal during tree growth. At each node $\mathcal{N}$ , fixed effects are removed by iterative demeaning using only the data inside $\mathcal{N}$ : $\begin{aligned} \widetilde{Y}_{it}^{(\mathcal{N})} &= Y_{it} - \hat{\alpha}_i^{(\mathcal{N})} - \hat{\gamma}_t^{(\mathcal{N})} \ \widetilde{D}_{it}^{(\mathcal{N})} &= D_{it} - \hat{\delta}_i^{(\mathcal{N})} - \hat{\eta}_t^{(\mathcal{N})} \end{aligned}$ where $\{\hat{\alpha}_i^{(\mathcal{N})},\hat{\gamma}_t^{(\mathcal{N})}\}$ and $\{\hat{\delta}_i^{(\mathcal{N})},\hat{\eta}_t^{(\mathcal{N})}\}$ are estimated by alternately demeaning over units and times within $\mathcal{N}$ , typically converging in 3–5 iterations.

This local residualization is fundamentally different from global demeaning, which risks introducing artifacts due to non-nested clusters as the tree recursively splits the data.

3. Splitting Criterion and Tree Growth

Splitting in CFFE is adapted to the fixed-effect-residualized node statistics. Consider a parent node $\mathcal{P}$ of size $n$ and a candidate split $S$ , generating left/right children $\mathcal{L}$ , $\mathcal{R}$ . The local treatment effect in each child is estimated as: $\hat\tau_{\mathcal{L}} = \frac{\sum_{(i,t)\in\mathcal{L}} \widetilde{D}_{it} \widetilde{Y}_{it}}{\sum_{(i,t)\in\mathcal{L}} \widetilde{D}_{it}^2}$ with an analogous formula for $\hat\tau_{\mathcal{R}}$ . The split impurity reduction is

$\Delta(S) = \frac{n_{\mathcal{L}} n_{\mathcal{R}}}{n^2} \left(\hat{\tau}_{\mathcal{L}} - \hat{\tau}_{\mathcal{R}}\right)^2$

The best split maximizes $\Delta(S)$ . All tree growth and leaf estimation use cluster-aware subsampling—sampling units (not individual observations) to preserve panel structure, and applying an "honest" sample split (structure-estimation disjointness) to reduce estimation bias.

4. Algorithm and Software Implementation

The CFFE algorithm, as implemented in the Python package causalfe, proceeds as follows:

from causalfe import CFFEForest
forest = CFFEForest(n_trees=100, max_depth=4, min_leaf=20, seed=42)
forest.fit(X, Y, D, unit, time)
tau_hat = forest.predict(X)
tau_hat, ci_lo, ci_hi = forest.predict_interval(X, alpha=0.05)

Posterior intervals rely on variance estimation by half-sample disagreement, but are often optimistic; for conservative inference, bootstrap or analytic cluster-robust estimators are advisable.

Key differences from standard causal forests:

Node-level, not global, residualization of fixed effects
Cluster (unit)-aware subsampling
Honest estimation with cluster-aware sample splitting

5. Simulation Studies and Empirical Behavior

Multiple simulation settings evaluate CFFE relative to standard causal forests:

Scenario	Standard CF RMSE / rho	CFFE RMSE / rho	Key Comparison
Heterogeneous DiD, no confounding	—	0.378 / 0.934	CFFE: accurate CATE
Fixed-effect confounding	0.506 / 0.965	0.405 / 0.910	CFFE: lower bias
Placebo ( $\tau=0$ )	—	–0.06 / 0.25 RMSE	Near-unbiased
Homogeneous ( $\tau=2$ )	—	1.79 / 0.34 RMSE	Mild bias (generic)
Heterogeneous ( $\tau=x_1$ )	—	0.90 rho / 0.54 RMSE	Good structure

Here, CFFE reduces MSE and bias over standard causal forests in the presence of fixed-effect confounding, without sacrificing heterogeneity ranking (correlation between $\hat\tau$ and $\tau$ remains high). In Monte Carlo scenarios, nominal 95% confidence interval coverage was 42–56%, indicating interval estimation is anti-conservative without further adjustment (Aytug, 15 Jan 2026).

6. Computing and Practical Considerations

CFFE computational complexity is $O(N_\mathrm{units} \times T \times n_\mathrm{trees} \times \text{splits})$ , with only a small per-node overhead for residualization (3–5 iterations). Tree-level parallelization is straightforward. Robust performance relies on:

Sufficient tree depth (typical range: 3–6), but not overfitting
Minimum leaf sizes ( $\geq20$ ), ensuring stability
Honest estimation with honest=True for valid inference
Careful handling of variance estimation

The method presumes a panel model with parallel trends, i.e., absence of time-varying confounders beyond fixed effects; violations will induce bias in $\tau(x)$ . A balanced or nearly balanced panel is required; mild imbalance is tolerated.

7. Position within Broader Causal Forest Literature

Standard causal forests (e.g., Wager & Athey (Wager et al., 2015)) can consistently estimate $\tau(x)$ under unconfoundedness and overlap in cross-sectional settings. Recent generalizations (e.g., Causal Survival Forests (Cui et al., 2020), Longitudinal Bayesian Causal Forests (McJames et al., 2024), Difference-in-Differences BCF (Souto et al., 14 May 2025)) adapt the approach to time-varying or panel data, but most either rely on global residualization or fail to account for fixed effects at the node level. Node-level fixed-effect removal, as in CFFE, is uniquely matched to mitigate spurious CATE heterogeneity from structural panel confounding, advancing nonparametric identification in high-dimensional panel data.

In summary, CFFE defines the state-of-the-art methodology for heterogeneous effect estimation in panel data with fixed effects by combining local residualization, clusterwise resampling, and causal forest ensemble estimation, with robust empirical and computational properties (Aytug, 15 Jan 2026).