Empirical Welfare Maximization (EWM)
- Empirical Welfare Maximization (EWM) is a statistical policy learning paradigm that selects treatment rules by maximizing modeled social welfare using empirical process theory.
- It integrates methods from experimental design, machine learning, and econometrics while leveraging doubly robust, IPW, and plug-in estimators for precise welfare estimation.
- Its performance is supported by uniform deviation bounds and regret guarantees, ensuring scalable policy fitting even under constraints and in time-series settings.
Empirical Welfare Maximization (EWM) is a statistical policy learning paradigm designed to select among treatment assignment rules with the explicit objective of maximizing population social welfare, usually defined as mean potential outcome under a given policy. EWM formalizes policy learning as a combinatorial optimization problem over a structured class of rules, grounded in principles of empirical process theory and robust causal inference. EWM unifies aspects of experimental design, machine learning classification, and econometric treatment choice; it has become central in the literature on optimal treatment assignment, welfare targeting, and data-driven resource allocation.
1. Formalization of the EWM Principle
Let be an observed i.i.d. or sequential sample, where are individual covariates, is treatment, and is the realized outcome, corresponding to potential outcomes . A policy , or equivalently a measurable "decision set" , prescribes treatment assignment based on . The population social welfare under is
where the expectation is over the joint distribution of 0. In the presence of unconfoundedness, 1 can be expressed using doubly robust or inverse-probability-weighted (IPW) estimators.
EWM proceeds by constructing an empirical analogue 2—via plug-in, IPW, or doubly robust scoring—and selecting
3
where 4 is a user-defined policy class, often of finite VC-dimension (Kato, 30 Oct 2025, Cerulli, 2020, Sun, 2021).
2. Estimation and Computational Methodology
Central to EWM is the empirical welfare criterion. For randomized or unconfounded settings, the canonical doubly robust estimator is
5
where 6 and 7 are estimated propensity score and outcome regression, respectively (Kato, 30 Oct 2025). Other variants include IPW or plug-in estimators depending on the sampling design and identification strategy (Sun, 2021, Cerulli, 2020).
The EWM optimization is combinatorial: maximizing (sample means of) linear functionals of the form 8 over 9, where 0 is the feasible class (e.g., threshold sets, decision trees). This is NP-hard for general classes but tractable for low-complexity or one-dimensional threshold rules via grid search (Cerulli, 2020, Crippa, 2024). Recent work establishes an exact equivalence between EWM and least-squares prediction over pseudo-outcomes in a suitable function class, enabling convex relaxation and scalable, regularized policy fitting (Kato, 30 Oct 2025).
3. Statistical Properties and Regret Guarantees
EWM's statistical validity relies on uniform deviation bounds for empirical processes indexed by the policy class. For 1 of VC-dimension 2, the expected regret satisfies
3
where 4 is the welfare-maximizing (oracle) policy in 5 (Kato, 30 Oct 2025, Cerulli, 2020). For threshold rules in regular nonparametric models, regret sharpens to 6 under additional smoothness and margin conditions (cube-root asymptotics) (Crippa, 2024). In dynamic or time-series settings with mixing or martingale conditions, regret bounds of similar 7 form hold under appropriate invariance and exogeneity properties (Kitagawa et al., 2022).
In instrumental variable (IV) models with endogeneity, social welfare is represented as a function of the marginal treatment effect (MTE) kernel: 8 EWM is applied by maximizing the empirical analogue of this integral, with the regret rate governed by both the complexity 9 of 0 and the uniform estimation rate of 1 (Sasaki et al., 2020, Liu, 2022). When MTE is estimated at 2-rate (parametric or low-dimensional IV), EWM recovers the 3 regret rate; otherwise, the convergence is limited by the slower MTE estimation rate.
4. Extensions: Constraints, Robustness, and Alternative Welfare Criteria
EWM can incorporate explicit constraints, such as budget or fairness restrictions. With a per-unit cost 4 and budget 5, the population-constrained problem is
6
where 7. Empirical analogues directly replace 8 with sample estimates. However, Naive sample-analogue constrained EWM exhibits failures in uniform feasibility and efficiency: no rule can achieve both asymptotically across all DGPs. Remedies include tightening constraints with critical values (size control) or penalizing constraint violation (trade-off rules) (Sun, 2021, Liu, 2022).
EWM generalizes to alternative social welfare functionals, notably α-Expected Welfare Maximization (α-EWM), which targets the lower-tail mean (CVaR) of the post-treatment outcome distribution over the worst-off α-fraction: 9 where 0 is the distribution function of 1. Estimation leverages dual representations and cross-fitted, doubly robust scores. Regret analysis reveals a 2 rate, with constants inflating as 3 (Fan et al., 1 May 2025). This framework covers Rawlsian and distributionally robust welfare optimization.
Time-series EWM (T-EWM) adapts the machinery to sequential or nonstationary data. It defines welfare objectives as conditional expectations over policy-induced paths and maximizes empirical IPS-weighted welfare along observed trajectories. Theoretical guarantees extend to martingale and Markov-type processes (Kitagawa et al., 2022).
5. Policy Classes, Threshold Rules, and Implementation Protocols
The choice of policy class 4 fundamentally impacts EWM's empirical behavior and feasibility. Common classes include
- Threshold rules: Scalar or multivariate policies of the form 5 or Cartesian products of indicator thresholds over selected coordinates. Regret rates and asymptotics are well understood for this class (Crippa, 2024, Cerulli, 2020).
- Linear scores and finite-depth decision trees: Used for interpretability and tractability.
- Set-indicator policies over VC-classes: General framework covering most practical applications.
Implementation is feasible with grid-search (for low-dimensional threshold rules), mixed-integer programming (for more complex policies), or, via the equivalence with least-squares, convex optimization for large-scale settings (Kato, 30 Oct 2025). Standard protocol entails (i) estimating individual-level causal effects (e.g., via regression-adjustment, doubly robust estimation, or IV), (ii) evaluating empirical policy-specific welfare over a defined grid or function class, and (iii) selecting the maximizer and reporting welfare and treatment group trade-offs (Cerulli, 2020).
6. Empirical Applications and Illustrations
EWM has been validated in a range of empirical settings:
- Threshold-based welfare program eligibility using job training (LaLonde) data (Cerulli, 2020), showing welfare gains over random assignment and enabling policy menus parameterized by interpretable thresholds.
- Medicaid expansion eligibility under budget constraints, where trade-off rules outperform naive constrained EWM in terms of welfare-efficiency and controlled budget violation (Sun, 2021).
- Optimal tuition subsidy assignment under endogeneity, using estimated MTE in the Indonesian Family Life Survey; EWM (FEWM/BEWM) rules target subpopulations with high predicted gains within budget (Liu, 2022).
- Dynamic pandemic response policies, where T-EWM estimated adaptive COVID-19 restriction rules with empirical regret improvements confirmed in both simulation and real-world weekly data (Kitagawa et al., 2022).
- Distributionally robust targeting (α-EWM), shifting treatment to disadvantaged subpopulations, with formal inference on lower-tail welfare (Fan et al., 1 May 2025).
7. Connections to Plug-in Policy Learning and Regularization
EWM and the plug-in approach—assigning treatment to those with positive estimated CATE—are theoretically equivalent under suitable reparameterization (Kato, 30 Oct 2025). Specifically, EWM can be formulated as least squares regression of a pseudo-outcome on the class 6, yielding an exact correspondence between maximizing empirical welfare and minimizing square error within the policy class. This equivalence enables the design of convex, regularized training algorithms, circumventing the NP-hardness of discrete optimization without loss of statistical guarantees. Regularization enhances stability, enables large-scale implementation, and accommodates additional convex constraints (budget, fairness) via joint convex optimization.
References
- "Welfare Analysis via Marginal Treatment Effects" (Sasaki et al., 2020)
- "Empirical Welfare Maximization with Constraints" (Sun, 2021)
- "Policy Learning under Endogeneity Using Instrumental Variables" (Liu, 2022)
- "Policy Learning with 7-Expected Welfare" (Fan et al., 1 May 2025)
- "Policy Choice in Time Series by Empirical Welfare Maximization" (Kitagawa et al., 2022)
- "Regret Analysis in Threshold Policy Design" (Crippa, 2024)
- "Optimal Policy Learning: From Theory to Practice" (Cerulli, 2020)
- "Bridging the Gap between Empirical Welfare Maximization and Conditional Average Treatment Effect Estimation in Policy Learning" (Kato, 30 Oct 2025)