Expected Counterfactual Utility

Updated 4 July 2026

Expected counterfactual utility is a framework that evaluates the expected utility of alternative actions using counterfactual distributions instead of solely relying on observed outcomes.
It employs off-policy correction methods like importance sampling and SNIPS, often with bounded importance ratios, to counteract bias in data collected under different policies.
This approach underpins robust decision-making in applications such as ad ranking, treatment assignment, and policy learning by directly comparing potential outcomes of unobserved scenarios.

Searching arXiv for papers on expected counterfactual utility and closely related formulations. Expected counterfactual utility is a family of quantities used to evaluate and optimize decisions by asking what utility a policy, treatment rule, ranking system, or strategic mechanism would achieve under outcomes that were not directly observed. Across recent work, the term refers not to a single canonical formula, but to a common structure: a utility functional is defined for a target policy or decision, and its expectation is taken with respect to a counterfactual distribution induced by that policy rather than by the logging, factual, or realized decision process. In some papers this expectation is computed from biased logs by off-policy correction; in others it is defined on the full vector of potential outcomes; and in still others it is evaluated on forward-looking counterfactual distributions or counterfactual predictive distribution sets. The unifying idea is that utility is assessed relative to alternative actions or policies rather than solely from realized outcomes (Yang et al., 21 Feb 2026, Koch et al., 13 May 2025, Koch et al., 6 May 2026).

1. Definition and conceptual scope

In contextual bandit and ranking settings, expected counterfactual utility is the expected utility under a target policy $\pi$ evaluated using data generated by a different policy $\pi_0$ . In "CaliCausalRank" (Yang et al., 21 Feb 2026), this is the expected business utility of a new ad ranking policy, estimated off-policy from biased click logs using position-aware SNIPS. In "Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion" (Block et al., 2022), it is the expected user utility that a query-ranking policy would achieve if deployed, where utility is defined as the probability that the user will click some relevant document in the ranking induced by a suggested query.

In statistical decision-theoretic work, expected counterfactual utility is the expectation of a utility that depends on the entire vector of potential outcomes, not only the realized one. "Statistical Decision Theory with Counterfactual Loss" (Koch et al., 13 May 2025) defines counterfactual risk as

$R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$

so expected counterfactual utility is its negative when utility is defined as $U=-\ell$ . "An Axiomatic Foundation for Decisions with Counterfactual Utility" (Koch et al., 6 May 2026) places this directly in an extended von Neumann–Morgenstern framework on the potential outcome space $Z := D \times Y^{D} \times X$ , with policy value

$V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$

A distinct but related usage appears in forward-looking causal decision problems. "Counterfactuals for the Future" (Bynum et al., 2022) distinguishes interventional utility $W(P_{Y_1}^{\fC;\doI})$ from counterfactual utility $W(P_{Y_1}^{\fC\mid\cX;\doI})$, where the latter evaluates future outcomes for the same observed units by reusing unit-specific exogenous noise inferred from current data. This yields a sample-specific welfare notion rather than a population-level interventional one.

These formulations differ in domain, but all use counterfactual expectations to compare actions or policies that cannot be directly observed under the available data-generating process. A plausible implication is that expected counterfactual utility is best understood as a general decision-functional schema rather than a single estimand.

2. Off-policy evaluation and counterfactual estimation

In logged-interaction systems, expected counterfactual utility is usually estimated by importance weighting or propensity correction. In "CaliCausalRank" (Yang et al., 21 Feb 2026), the logged data are generated by a production policy $\pi_0$ , and click logs are distorted by position bias. The paper adopts the examination model

$P(\text{click} \mid x, pos) = P(\text{examine} \mid pos)\, P(\text{click} \mid \text{examine}, x),$

estimates position propensities $\pi_0$ 0 from randomized traffic, and uses the self-normalized importance sampling estimator

$\pi_0$ 1

Here expected counterfactual utility is the central quantity both for offline evaluation and for optimization under business constraints.

The same general logic appears in query autocompletion. In "Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion" (Block et al., 2022), the utility of a query $\pi_0$ 2 under context $\pi_0$ 3 is

$\pi_0$ 4

and the paper constructs an unbiased IPS-like estimator $\pi_0$ 5 using ratios of observation propensities. It then defines a counterfactual policy-level objective $\pi_0$ 6 obtained by plugging these utility estimates into a pairwise ranking criterion, with Lemma 1 showing $\pi_0$ 7. In that sense, the expected counterfactual utility equals the true expected utility-based objective under the model assumptions.

The broader off-policy template is stated explicitly in (Yang et al., 21 Feb 2026): $\pi_0$ 8 This is the classical counterfactual-risk-minimization relation specialized to utility. The ranking and ad-serving papers adapt it to position-bias settings by replacing unknown action-propensity ratios with inverse examination probabilities. This suggests that in online systems, expected counterfactual utility is often operationalized as an off-policy expected reward with application-specific propensities.

3. Utility on full potential-outcome spaces

A second major line of work defines expected counterfactual utility directly on the full vector of potential outcomes. In "Statistical Decision Theory with Counterfactual Loss" (Koch et al., 13 May 2025), the primitive object is a counterfactual loss

$\pi_0$ 9

and the corresponding risk

$R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 0

This framework is motivated by decision criteria that compare the chosen action with feasible alternatives, such as overtreatment penalties or asymmetries between harming and failing to help. The paper proves that under strong ignorability, counterfactual risk differences are identifiable if and only if the counterfactual loss is additive in the potential outcomes. If $R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 1, the counterfactual risk level itself is exactly identifiable.

"Policy Learning with Asymmetric Counterfactual Utilities" (Ben-Michael et al., 2022) develops the same theme for binary treatment with utility $R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 2 defined on the principal strata $R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 3. The value of a policy is

$R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 4

With the asymmetric parameterization in the paper, the expected utility loss between policies depends on the harmful-stratum principal score $R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 5, which is not identified under ignorability. The paper therefore partially identifies expected counterfactual utility and optimizes a minimax worst-case utility loss over Fréchet bounds.

The axiomatic paper (Koch et al., 6 May 2026) generalizes this idea. It defines the extended outcome space

$R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 6

and proves that preferences on $R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 7 satisfy the vNM axioms if and only if they admit an expected counterfactual utility representation. It further shows that adding Irrelevance of Counterfactual Outcomes reduces the representation to standard expected utility on realized outcomes, while Irrelevance of Counterfactual Correlation characterizes additive counterfactual utilities

$R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 8

These papers establish that expected counterfactual utility is not merely an estimation trick. It can be the primitive normative object of a decision problem, with realized-outcome utility recovered only under additional axioms or structural restrictions.

4. Robustness, uncertainty, and identification

A recurring issue is that counterfactual utility is often only partially identified or statistically unstable. The sources of difficulty vary by domain.

In logged bandit and ranking systems, the main problem is variance. "CaliCausalRank" (Yang et al., 21 Feb 2026) explicitly defines a robust counterfactual utility loss

$R(D^\ast;\ell) := E\bigl[\ell(D^\ast;Y(0),\ldots,Y(K-1),X)\bigr],$ 9

with $U=-\ell$ 0 in experiments. The variance term regularizes against extreme inverse-propensity weights, and the paper treats robustness as stability and reliability sufficient for training and constraint enforcement. In the query-autocompletion setting, (Block et al., 2022) shows that IPS utility estimates can have arbitrarily large variance and therefore imposes a bounded importance-ratio assumption. Under that assumption, the estimator remains unbiased and enjoys variance control.

In decision-theoretic formulations, the main issue is identification. In (Koch et al., 13 May 2025), exact identification of expected counterfactual utility requires additive loss with no interaction term $U=-\ell$ 1. In (Ben-Michael et al., 2022), asymmetric counterfactual utilities generally depend on unidentified principal scores and therefore admit only partial identification under strong ignorability. The proposed solution is to optimize worst-case expected utility loss relative to a comparator policy.

In causal forecasting and treatment choice, the issue is the correct target distribution. "Counterfactuals for the Future" (Bynum et al., 2022) shows that forward-looking counterfactual utility can diverge from interventional utility depending on assumptions about stability and structure of exogenous noise. The paper summarizes two settings where forward-looking counterfactuals are appropriate: where exogenous factors are sufficiently stable over time, and where units’ exogenous factors are sufficiently dissimilar.

In empirical games, uncertainty appears both from partial identification of model parameters and from equilibrium multiplicity. "Counterfactual Analysis in Empirical Games" (Kline et al., 2024) handles this with counterfactual predictive distribution sets. For each $U=-\ell$ 2, the framework computes lower and upper expected welfare over $U=-\ell$ 3, and then ranges over the identified set $U=-\ell$ 4. Thus expected counterfactual utility becomes set-valued even before adding posterior uncertainty.

The main robustness strategies across these literatures are summarized below.

Setting	Source of ambiguity	Response
Off-policy ranking (Yang et al., 21 Feb 2026)	Position bias and high-variance weights	SNIPS and variance regularization
QAC ranking-of-rankings (Block et al., 2022)	Large propensity ratios	Bounded importance-ratio assumption and clipping
Counterfactual loss (Koch et al., 13 May 2025)	Non-identifiable joint potential outcomes	Additivity conditions for identifiable risk
Asymmetric treatment utility (Ben-Michael et al., 2022)	Unidentified principal scores	Partial identification and minimax regret
Forward-looking SCMs (Bynum et al., 2022)	Ambiguous exogenous-noise assumptions	Compare interventional and forward-looking welfare
Empirical games (Kline et al., 2024)	Partial identification and multiple equilibria	CPDS and posterior set consistency

A plausible implication is that “robust expected counterfactual utility” has no uniform formal meaning across fields; it refers either to low-variance estimation, sharp partial identification, or stability under structural ambiguity, depending on the modeling tradition.

5. Optimization and policy learning

Expected counterfactual utility is not only evaluated; it is optimized.

In ad ranking, (Yang et al., 21 Feb 2026) places counterfactual utility directly inside the training loop. The total objective is

$U=-\ell$ 5

where $U=-\ell$ 6 maximizes robust SNIPS-estimated utility, $U=-\ell$ 7 enforces calibration across segments, and $U=-\ell$ 8 is a Lagrangian penalty for CPC and risk constraints. The corresponding constrained optimization problem is

$U=-\ell$ 9

with fairness constraints on exposure. Expected counterfactual utility is therefore the objective maximized subject to operational constraints.

In query autocompletion, (Block et al., 2022) defines the empirical counterfactual objective $Z := D \times Y^{D} \times X$ 0 and chooses

$Z := D \times Y^{D} \times X$ 1

This is counterfactual risk minimization in a ranking-of-rankings setting, where utility is based on downstream retrieval success rather than imitation of user query choice.

In statistical decision theory, (Koch et al., 13 May 2025) treats expected counterfactual utility as the normative criterion for optimal treatment choice. A policy solves

$Z := D \times Y^{D} \times X$ 2

The paper shows that for binary decisions, additive counterfactual loss induces the same policy ordering as some standard realized-outcome loss, but for $Z := D \times Y^{D} \times X$ 3 actions this equivalence breaks down. Expected counterfactual utility can therefore induce genuinely different optimal policies in multi-arm settings.

In the asymmetric-utility policy-learning framework (Ben-Michael et al., 2022), the oracle policy depends on both the CATE $Z := D \times Y^{D} \times X$ 4 and the harmful principal stratum $Z := D \times Y^{D} \times X$ 5: $Z := D \times Y^{D} \times X$ 6 Because $Z := D \times Y^{D} \times X$ 7 is not identified, the paper instead learns minimax policies against worst-case expected utility loss. This is a direct instance where optimizing expected counterfactual utility requires robustification against identification ambiguity.

In strategic decision systems with recourse-style explanations, (Tsirtsis et al., 2020) defines the decision maker’s expected utility after strategic adaptation as

$Z := D \times Y^{D} \times X$ 8

For fixed policy, the set function $Z := D \times Y^{D} \times X$ 9 is non-negative, monotone, and submodular; for joint optimization with $V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 0, the function $V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 1 is non-negative, submodular, and non-monotone. Here expected counterfactual utility is the expected utility after agents strategically move in response to explanations.

6. Broader extensions and interpretations

The concept extends beyond ranking and treatment assignment.

In fairness, "Fairness Through Counterfactual Utilities" (Blandin et al., 2021) defines welfare $V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 2 and cost $V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 3 for algorithm $V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 4, then defines qualification by the existence of some counterfactual algorithm $V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 5 such that $V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 6 and $V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 7. Quantities such as

$V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 8

are therefore expectations over counterfactually qualified subsets of individuals. This generalizes group fairness notions beyond prediction labels and explicitly uses utilities under alternative algorithms.

In AI deployment, "When is using AI the rational choice? The importance of counterfactuals in AI deployment decisions" (Lehner et al., 4 Apr 2025) decomposes expected utility as

$V_P(\pi;\tilde u)=\mathbb{E}_{P^\pi}[\tilde u(D;Y(0),\ldots,Y(K-1),X)].$ 9

The counterfactual term is

$W(P_{Y_1}^{\fC;\doI})$0

where $W(P_{Y_1}^{\fC;\doI})$1 indexes AI-vs-no-AI counterfactual cells such as counterfactual true positives and counterfactual false negatives. This is an expected counterfactual utility in a deployment sense: it measures extra utility or disutility from the fact that an outcome is attributable to AI rather than to the unaided decision maker.

In continuous structural-causal settings, (Weilbach et al., 2023) does not name expected counterfactual utility as a standalone object, but provides the ingredients to compute

$W(P_{Y_1}^{\fC;\doI})$2

under a Bayesian Warped Gaussian Process SCM that averages over uncertainty in both latent functions and SCM parametrization. This turns expected counterfactual utility into an uncertainty-aware Bayesian integral over a counterfactual predictive distribution.

In before-and-after randomized trials, (Wang et al., 2024) studies the estimand

$W(P_{Y_1}^{\fC;\doI})$3

and interprets it as an expected counterfactual utility difference when $W(P_{Y_1}^{\fC;\doI})$4 is itself a utility or monotone in utility. The paper’s main contribution is uncertainty quantification: it shows that the variance of the ideal within-patient counterfactual efficacy can be smaller than factual between-arm variance under the ETZ decomposition.

Finally, in nonparametric random utility demand, (Kitamura et al., 2019) and in approximate quasilinear demand, (Allen et al., 2020), expected counterfactual utility appears as an expectation of a welfare functional of counterfactual demand. In (Kitamura et al., 2019), one sets $W(P_{Y_1}^{\fC;\doI})$5 and obtains sharp bounds on

$W(P_{Y_1}^{\fC;\doI})$6

In (Allen et al., 2020), quasilinear indirect-utility differences $W(P_{Y_1}^{\fC;\doI})$7 and $W(P_{Y_1}^{\fC;\doI})$8 provide robust bounds on welfare changes under approximate model misspecification. These are not off-policy expectations in the machine-learning sense, but they are still counterfactual utility expectations under alternative prices or bundles.

7. Misconceptions and points of controversy

A common misconception is that expected counterfactual utility is always a single identifiable scalar. The literature does not support that view. In (Koch et al., 13 May 2025), exact identification requires additive structure; in (Ben-Michael et al., 2022), asymmetric utilities are generally only partially identified; and in (Kline et al., 2024), expected counterfactual utility is set-valued because of equilibrium multiplicity and partial identification. Even in logged ranking systems, (Yang et al., 21 Feb 2026) and (Block et al., 2022) emphasize that off-policy estimates depend on propensity assumptions and variance control.

A second misconception is that counterfactual utility necessarily violates coherence or transitivity. The axiomatic analysis in (Koch et al., 6 May 2026) argues the opposite on the correct domain: expected counterfactual utility satisfies the vNM axioms on $W(P_{Y_1}^{\fC;\doI})$9. Apparent paradoxes arise when preferences are projected onto realized-outcome spaces through menu-dependent or context-dependent constructions.

A third misconception is that counterfactual utility always improves decision quality merely by incorporating more information. Several papers caution otherwise. (Bynum et al., 2022) shows that forward-looking counterfactual welfare can be appropriate or misleading depending on assumptions about exogenous-noise stability. (Lehner et al., 4 Apr 2025) shows that positive outcome utility from AI deployment can coexist with negative stakeholder counterfactual utility. (Wang et al., 2024) warns that subgroup estimation with noisy predictors may suffer attenuation bias even when the average effect remains unbiased.

The main controversy, therefore, is not whether counterfactual utility exists as a formal object, but which version is normatively appropriate and statistically defensible in a given application.

8. Synthesis

Expected counterfactual utility denotes the expected value of a utility functional under a hypothetical policy, intervention, or decision regime, where the relevant distribution is counterfactual rather than factual. In ranking and bandit systems, it is an off-policy expected reward estimated by IPS or SNIPS and often optimized directly during training (Yang et al., 21 Feb 2026, Block et al., 2022). In statistical decision theory and causal inference, it is the expectation of utility on the full potential-outcome vector, with additivity determining whether the quantity is point-identified (Koch et al., 13 May 2025, Koch et al., 6 May 2026). In forward-looking SCMs, it is welfare computed on counterfactual future distributions for the same units (Bynum et al., 2022). In empirical games, it becomes a set of possible expectations indexed by equilibria and partially identified parameters (Kline et al., 2024).

What unifies these uses is the insistence that decision quality cannot always be judged from realized outcomes alone. Sometimes one must compare the chosen action with alternatives that were feasible but unobserved, and expected counterfactual utility is the mathematical device that carries out that comparison. The precise form of the expectation, the utility, and the source of uncertainty vary widely across fields, but the central question is stable: what utility would a policy or decision have produced if the world had evolved under that alternative rather than under what was actually observed?