Causal-Policy Forest for End-to-End Policy Learning (2512.22846v1)

Published 28 Dec 2025 in econ.EM, cs.LG, math.ST, stat.ME, and stat.ML

Abstract: This study proposes an end-to-end algorithm for policy learning in causal inference. We observe data consisting of covariates, treatment assignments, and outcomes, where only the outcome corresponding to the assigned treatment is observed. The goal of policy learning is to train a policy from the observed data, where a policy is a function that recommends an optimal treatment for each individual, to maximize the policy value. In this study, we first show that maximizing the policy value is equivalent to minimizing the mean squared error for the conditional average treatment effect (CATE) under ${-1, 1}$ restricted regression models. Based on this finding, we modify the causal forest, an end-to-end CATE estimation algorithm, for policy learning. We refer to our algorithm as the causal-policy forest. Our algorithm has three advantages. First, it is a simple modification of an existing, widely used CATE estimation method, therefore, it helps bridge the gap between policy learning and CATE estimation in practice. Second, while existing studies typically estimate nuisance parameters for policy learning as a separate task, our algorithm trains the policy in a more end-to-end manner. Third, as in standard decision trees and random forests, we train the models efficiently, avoiding computational intractability.

Summary

The paper introduces a novel causal-policy forest that directly bridges CATE estimation and policy learning by reducing the objective to binary regression.
It adapts standard causal forests with restricted MSE splitting and honest estimation to yield efficient, scalable treatment recommendation rules.
Empirical results show near-oracle performance with significantly reduced regret compared to plug-in and pseudo-outcome methods.

Causal-Policy Forest: A Direct Approach to End-to-End Policy Learning

Introduction

The paper "Causal-Policy Forest for End-to-End Policy Learning" (2512.22846) presents a unified framework for policy learning in the context of causal inference. Policy learning aims to estimate individualized treatment assignment policies by maximizing expected social welfare, given observational data consisting of covariates, treatments, and outcomes. The approach exploits the correspondence between maximizing welfare (policy value) and minimizing mean squared error (MSE) for the conditional average treatment effect (CATE) when predictors are restricted to binary values. The proposed solution, the causal-policy forest, directly targets the policy objective by adapting causal forests—traditionally used for CATE estimation—to operate end-to-end for policy learning.

Theoretical Foundation and Equivalence

The core contribution is the rigorous demonstration that, under binary treatments and policy classes with deterministic, binary-valued policies, the empirical welfare maximization (EWM) objective is mathematically equivalent to least squares regression of the CATE with predictors in $\{-1,1\}$ . Specifically, this equivalence holds:

Let $\tau_0(x)$ denote the true CATE, and $g(x)$ be a policy-induced predictor in $\{-1,1\}$ (where $g(x)=1$ corresponds to recommending treatment, $-1$ to no treatment).
The optimal policy $\pi^*$ within a set $\Pi$ can be obtained as the solution to $\argmax_{\pi\in\Pi} W(\pi)$ , where $W(\pi)$ is the expected welfare.
Simultaneously, the solution to $\argmin_{g\in \mathcal{G}_\Pi} \mathbb{E}[(\tau_0(X)-g(X))^2]$ , with $\mathcal{G}_\Pi = \{2\pi-1 : \pi\in\Pi\}$ , produces a predictor $g^*=2\pi^*-1$ , i.e., the policy recommendation.

This establishes that CATE estimation and policy learning can be seamlessly bridged within a unified statistical framework. Thus, tree-based CATE estimation machinery, when subject to the appropriate output constraints, can yield optimal individualized policies.

Algorithmic Structure: Causal-Policy Forest

The proposed causal-policy forest modifies the standard causal forest algorithm in several critical ways:

Split Criteria: Rather than splitting to minimize real-valued MSE of $\widehat{\tau}(x)$ , splits are chosen to minimize the restricted MSE between the leafwise CATE and binary valued policy scores. This places the decision boundary (sign of the estimated CATE) as the central object.
Honest Estimation: Tree construction follows an honest forest design, partitioning the sample into a split subsample (for growing the tree) and an estimation subsample (for estimating leaf statistics), to prevent overfitting the binary decision to the training sample.
Output Rule: Each leaf's policy score is the sign of its estimated CATE. Formally, $\widehat{g}(x) = \mathrm{sign}(\widehat{\tau}(x))$ .
Computational Advantages: By aggregating over trees and using recursive partitioning with simple binary rules, the method is efficient and scalable, retaining favorable computational properties of standard random forests.

Through this approach, the causal-policy forest maintains a modular structure: it can easily incorporate subsampling, random feature selection, honest estimation, and piecewise constant leafwise predictors, while yielding directly usable treatment recommendation policies.

Empirical Results

A synthetic simulation study highlights the method's empirical properties:

The data generating process involves $n=10\,000$ samples, $p=10$ covariates, treatment assignment confounded through the propensity score, and CATE heterogeneity across $X$ .
The causal-policy forest is benchmarked against (1) the oracle policy using the true CATE, (2) a policy tree with doubly robust (DR) pseudo-outcomes, and (3) a plug-in thresholded X-learner with gradient boosting regression.
Numerical results demonstrate that the causal-policy forest achieves a policy value of $0.1730$ (regret $0.0103$), approaching the oracle value of $0.1833$, and outperforming the policy tree (value $0.1247$, regret $0.0586$) and X-learner (value $0.0834$, regret $0.0999$).
The main claim supported numerically is that explicitly targeting the policy learning objective with a binary CATE reduction substantially reduces regret and achieves value near the oracle, outperforming plug-in and existing pseudo-outcome-based approaches.

Theoretical and Practical Implications

Unified Framework: The reduction of policy learning to restricted CATE regression unifies two previously distinct tasks, providing a strong foundation for future methodological developments in individualized treatment effect estimation and prescription.
End-to-End Optimization: By integrating estimation and policy selection, the approach avoids two-stage bias and inefficiency inherent in plug-in and empirical risk minimization strategies that rely on nuisance parameter estimation.
Computational Tractability: The method is efficient, avoiding combinatorial optimization over policy classes and inheriting the favorable statistical and computational properties of tree-based ensemble methods.

Potential Directions and Impact

Generalization to Multiple Treatments: The formulation naturally extends to multi-action policy learning.
Robustness and Generalization: The approach could be further enhanced by incorporating robustness to unmeasured confounding or extending to complex outcome spaces.
Interpretability: The piecewise constant structure, coupled with clear decision boundaries, enhances interpretability, which is crucial for high-stakes policy deployment scenarios.
Connection to Riesz Representers: The method inherently integrates aspects of Riesz regression through tree-based partitioning, suggesting directions for further theoretical analysis.

Conclusion

The causal-policy forest provides a statistically motivated, computationally efficient, and practically scalable solution for end-to-end individualized policy learning in causal inference. By leveraging the equivalence between policy welfare maximization and binary CATE regression, it achieves strong empirical performance, clarifies the relationship between CATE estimation and policy learning, and sets a modular foundation for future advances in both methodology and theory.