Causal Forests with Fixed Effects
- Causal forests are nonparametric ensemble methods that estimate heterogeneous treatment effects by capturing local variations and controlling for fixed effects.
- The fixed-effects extension (CFFE) uses node-level residualization and cluster-aware subsampling to robustly handle confounding in panel data.
- Empirical studies show that CFFE reduces bias and mean-squared error compared to standard causal forests, ensuring more reliable inference.
Causal forests are nonparametric ensemble methods for estimating heterogeneous treatment effects, particularly the conditional average treatment effect (CATE) , under unconfoundedness and overlap. In standard cross-sectional data, causal forests employ recursive partitioning and aggregation to capture local structure in treatment effect heterogeneity, with theoretical guarantees for consistency and valid inference. However, when applied to panel data settings with unit and time fixed effects, standard causal forests are susceptible to spurious heterogeneity induced by nonparametric confounding, such as (unit effects) and (time effects). Causal Forests with Fixed Effects (CFFE) introduce a principled extension that achieves robust estimation of heterogeneous treatment effects in such panel data, integrating node-level fixed effect residualization, cluster-aware subsampling, and an adapted split criterion (Aytug, 15 Jan 2026).
1. Panel Data Model and Identification
CFFE targets the panel-data setting: with units and time periods, binary treatment , and covariates . The estimand is the CATE .
Panel fixed effects and act as perfectly collinear predictors at the global level. Their omission from forests or global residualization prior to tree growth induces spurious heterogeneous effect estimates, confounding the recovery of true CATE structure.
2. Node-Level Residualization
The methodological core of CFFE is node-level fixed effect removal during tree growth. At each node , fixed effects are removed by iterative demeaning using only the data inside : where and are estimated by alternately demeaning over units and times within , typically converging in 3–5 iterations.
This local residualization is fundamentally different from global demeaning, which risks introducing artifacts due to non-nested clusters as the tree recursively splits the data.
3. Splitting Criterion and Tree Growth
Splitting in CFFE is adapted to the fixed-effect-residualized node statistics. Consider a parent node of size and a candidate split , generating left/right children , . The local treatment effect in each child is estimated as: with an analogous formula for . The split impurity reduction is
The best split maximizes . All tree growth and leaf estimation use cluster-aware subsampling—sampling units (not individual observations) to preserve panel structure, and applying an "honest" sample split (structure-estimation disjointness) to reduce estimation bias.
4. Algorithm and Software Implementation
The CFFE algorithm, as implemented in the Python package causalfe, proceeds as follows:
1 2 3 4 5 |
from causalfe import CFFEForest forest = CFFEForest(n_trees=100, max_depth=4, min_leaf=20, seed=42) forest.fit(X, Y, D, unit, time) tau_hat = forest.predict(X) tau_hat, ci_lo, ci_hi = forest.predict_interval(X, alpha=0.05) |
Key differences from standard causal forests:
- Node-level, not global, residualization of fixed effects
- Cluster (unit)-aware subsampling
- Honest estimation with cluster-aware sample splitting
5. Simulation Studies and Empirical Behavior
Multiple simulation settings evaluate CFFE relative to standard causal forests:
| Scenario | Standard CF RMSE / rho | CFFE RMSE / rho | Key Comparison |
|---|---|---|---|
| Heterogeneous DiD, no confounding | — | 0.378 / 0.934 | CFFE: accurate CATE |
| Fixed-effect confounding | 0.506 / 0.965 | 0.405 / 0.910 | CFFE: lower bias |
| Placebo () | — | –0.06 / 0.25 RMSE | Near-unbiased |
| Homogeneous () | — | 1.79 / 0.34 RMSE | Mild bias (generic) |
| Heterogeneous () | — | 0.90 rho / 0.54 RMSE | Good structure |
Here, CFFE reduces MSE and bias over standard causal forests in the presence of fixed-effect confounding, without sacrificing heterogeneity ranking (correlation between and remains high). In Monte Carlo scenarios, nominal 95% confidence interval coverage was 42–56%, indicating interval estimation is anti-conservative without further adjustment (Aytug, 15 Jan 2026).
6. Computing and Practical Considerations
CFFE computational complexity is , with only a small per-node overhead for residualization (3–5 iterations). Tree-level parallelization is straightforward. Robust performance relies on:
- Sufficient tree depth (typical range: 3–6), but not overfitting
- Minimum leaf sizes (), ensuring stability
- Honest estimation with
honest=Truefor valid inference - Careful handling of variance estimation
The method presumes a panel model with parallel trends, i.e., absence of time-varying confounders beyond fixed effects; violations will induce bias in . A balanced or nearly balanced panel is required; mild imbalance is tolerated.
7. Position within Broader Causal Forest Literature
Standard causal forests (e.g., Wager & Athey (Wager et al., 2015)) can consistently estimate under unconfoundedness and overlap in cross-sectional settings. Recent generalizations (e.g., Causal Survival Forests (Cui et al., 2020), Longitudinal Bayesian Causal Forests (McJames et al., 2024), Difference-in-Differences BCF (Souto et al., 14 May 2025)) adapt the approach to time-varying or panel data, but most either rely on global residualization or fail to account for fixed effects at the node level. Node-level fixed-effect removal, as in CFFE, is uniquely matched to mitigate spurious CATE heterogeneity from structural panel confounding, advancing nonparametric identification in high-dimensional panel data.
In summary, CFFE defines the state-of-the-art methodology for heterogeneous effect estimation in panel data with fixed effects by combining local residualization, clusterwise resampling, and causal forest ensemble estimation, with robust empirical and computational properties (Aytug, 15 Jan 2026).