Empirical Decision Problems: An Observables Framework

Updated 4 July 2026

Empirical Decision Problems are decision models that replace explicit states with observed act–consequence protocols to infer choice functions directly from behavioral data.
They establish consistency and convergence through empirical analogs of classical optimality, using methods like VC bounds and permutation tests to ensure robust statistical inference.
The framework extends classical decision theory by linking normative criteria with observable outcomes, enabling applications in dynamic programming, empirical Bayes, and generative AI prompt analysis.

Searching arXiv for papers on empirical decision problems and closely related frameworks. Empirical decision problems are decision models in which the primitive data are not states of the world, lotteries over consequences, or a fully specified consequence function, but a protocol of observed act–consequence pairs. In the formulation of Jansen et al., an empirical decision problem (EDP) is the pair $(\mathcal A,\pi)$ , where $\mathcal A$ is a finite set of action descriptions and $\pi$ is a finite protocol recording observed pairs of actions and consequences; the framework “completely overcomes the need to specify the states of the world explicitly” and studies optimality through empirical choice functions defined directly on subprotocols (Jansen et al., 5 Dec 2025). Within this line of work, the central technical questions are how to define empirical analogs of optimality, when empirical choice functions consistently recover population choice sets, how such choice rules can be statistically tested, and how robustness can be preserved when i.i.d. assumptions are relaxed.

1. Formal definition and primitives

The basic ingredients of an EDP are deliberately sparse. One fixes a finite set $\mathcal A$ of action descriptions and a possibly infinite set $\mathcal C$ of observable consequences. For a vector $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ , a $\mathbf z$ -protocol is a finite sequence

$\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$

in which exactly $z_k$ of the pairs have the act $a_k\in\mathcal A$ . The set of subprotocols using only a subset $\mathcal A$ 0 is denoted by $\mathcal A$ 1. An empirical decision problem is then simply the pair $\mathcal A$ 2 (Jansen et al., 5 Dec 2025).

The corresponding decision object is an empirical choice function (ECF). Writing $\mathcal A$ 3, an ECF is any map

$\mathcal A$ 4

such that for every subprotocol $\mathcal A$ 5,

$\mathcal A$ 6

Thus the output of the choice rule is always a subset of the actions actually represented in the relevant subprotocol (Jansen et al., 5 Dec 2025).

This formalization is intentionally different from classical setups in which states, consequences, and act mappings are specified ex ante. In an EDP, the observable protocol itself is primitive. A plausible implication is that the framework is designed for settings in which the modeler is unwilling or unable to commit to a “closed world” description before observing behavior or outcomes.

2. Empirical optimality without explicit states

The EDP literature introduces empirical analogs of classical optimality by identifying each action description $\mathcal A$ 7 with a latent random variable $\mathcal A$ 8 taking values in $\mathcal A$ 9. A population choice function $\pi$ 0 is fixed by a binary criterion function $\pi$ 1, and the exclusion of an action from a choice set is expressed pairwise: $\pi$ 2 The empirical analogue replaces the latent distributions by empirical distributions $\pi$ 3 computed from the protocol (Jansen et al., 5 Dec 2025).

Under the stated regularity conditions, the empirical choice function is a consistent approximator of the population choice set. The required assumptions are that $\pi$ 4 depends on distributions only and is uniformly continuous in a pseudometric $\pi$ 5, that the class of sets $\pi$ 6 has finite VC-dimension, and that one adds a regularization term of order $\pi$ 7. The resulting rule excludes $\pi$ 8 whenever there exists $\pi$ 9 such that

$\mathcal A$ 0

For each fixed $\mathcal A$ 1, as $\mathcal A$ 2, one then has almost sure convergence

$\mathcal A$ 3

This is the core consistency theorem of the framework (Jansen et al., 5 Dec 2025).

The same work also establishes basic existence results. Under very mild conditions, including transitivity and antisymmetry of $\mathcal A$ 4, ECFs always exist; examples include selecting the protocol’s empirical expected-utility maximizers or its empirical first-order-stochastic-dominance-undominated options (Jansen et al., 5 Dec 2025). This matters because the theory is not restricted to one specific optimality criterion. Instead, it defines a statistical interface through which different normative criteria can be estimated from observed act–consequence data.

3. Statistical testing, convergence rates, and robustness

The inferential program around EDPs has three components: consistent statistical estimation of choice sets, consistent statistical testing of choice functions with robustness guarantees, and direct inference for empirical choice functions using credal sets (Jansen et al., 5 Dec 2025).

For estimation, convergence rates are driven by VC bounds. For each pair $\mathcal A$ 5 and each $\mathcal A$ 6,

$\mathcal A$ 7

where $\mathcal A$ 8 is the VC-dimension and $\mathcal A$ 9. The rate statement is important because it links empirical decision problems to standard empirical-process machinery rather than treating protocol-based choice as purely descriptive (Jansen et al., 5 Dec 2025).

For testing, the null and alternative are formulated directly in terms of membership in the population choice set: $\mathcal C$ 0 Under pairwise ordering, the problem decomposes into two-sample tests. A permutation or bootstrap test is built on the statistic

$\mathcal C$ 1

Under exchangeability it has exact level $\mathcal C$ 2, and under the paper’s Assumptions 1–4—including i.i.d. sampling, pairwise ordering, inclusion of the null “ $\mathcal C$ 3” in distributional equivalence, and a least-favorable-distribution structure—it remains asymptotically valid for the one-sided null $\mathcal C$ 4 and is consistent against the directed alternative (Jansen et al., 5 Dec 2025).

Robustness is handled through contamination models. For each action block, one forms the credal set

$\mathcal C$ 5

Replacing empirical distributions in the test statistic by worst-case elements of these sets yields a robust permutation test that remains asymptotically level $\mathcal C$ 6 when each protocol block lies in a $\mathcal C$ 7-contamination model around an i.i.d. sample (Jansen et al., 5 Dec 2025).

These three layers give EDPs an unusual profile: the framework is state-free at the level of primitives, but statistically disciplined at the level of inference.

4. Relation to classical decision theory and identification

Empirical decision problems are best understood against adjacent attempts to relax the classical requirement that a decision problem be specified through an explicit state space. In constructive decision theory, the primitive objects of choice are syntactic programs formed from tests and actions, such as conditional expressions of the form “if $\mathcal C$ 8 then $\mathcal C$ 9 else $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 0.” Blume, Easley, and Halpern show that if preferences over such programs satisfy axioms including statewise cancellation and graph-closedness, then there exist a state space $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 1, an outcome space $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 2, an interpretation of tests and choices, a probability on $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 3, and a utility function on $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 4 yielding an SEU representation; crucially, states and outcomes are not part of the initial description of the problem (0906.4316). EDPs share this refusal to take states as primitive, but they replace preference axioms on syntactic programs with inference from observed act–consequence protocols.

A second neighboring line studies what can be recovered from rankings of information structures. Strzalecki and Stewart define a static decision problem as a triple $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 5 with finite $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 6, action set $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 7, bounded von Neumann–Morgenstern utility $\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 8, and value-of-belief function

$\mathbf z=(z_1,\dots,z_n)\in\mathbb N^n$ 9

They show that finitely many ordinal comparisons of experiment values identify the finite set of undominated actions, up to relabeling and duplication, together with the partition of the belief simplex on which each action is optimal; finitely many additional cardinal comparisons identify the piecewise-affine value function $\mathbf z$ 0 up to an action-independent payoff (Whitmeyer, 2024). This suggests a complementary perspective: one can sometimes reconstruct the “shape” of a hidden decision problem from finite empirical comparisons even when the underlying utility description is not directly observed.

A third adjacent literature concerns identification problems under missing data. Dominitz and Manski formulate empirical decision making through a choice set $\mathbf z$ 1, states of nature $\mathbf z$ 2, welfare function $\mathbf z$ 3, and identified set $\mathbf z$ 4 of states observationally equivalent to observed data $\mathbf z$ 5. They emphasize maximin and minimax-regret criteria when $\mathbf z$ 6 remains large, and show that for bounded outcomes $\mathbf z$ 7 the agnostic identification region for $\mathbf z$ 8 has width equal to the non-response fraction $\mathbf z$ 9 unless additional assumptions such as MAR are imposed (Dominitz et al., 15 Sep 2025). Relative to that framework, EDPs move even further toward observables-only primitives, but the shared concern with irreducible ambiguity and robust inference is evident.

5. Applications and computational realizations

The proof-of-concept application in the EDP paper compares prompting strategies in a generative-AI setting. The actions are three prompts: $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 0 “Explain GO in exactly 20 words.”, $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 1 “Explain GO in exactly 20 words, please.”, and $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 2 “Tell me what GO is in exactly 20 words. No more, no less. Just do it.” The protocol uses $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 3, $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 4, and $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 5 prompt repetitions. Each response is evaluated by Perplexity (lower is better) and Coherence (higher is better), and $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 6 is taken as the class of all pairwise-monotone functions on $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 7. Under empirical FSD, all three prompts are pairwise incomparable, so

$\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 8

At level $\pi=\bigl((a_{i_1},c_{i_1}),\dots,(a_{i_N},c_{i_N})\bigr), \qquad N=\sum_{k=1}^n z_k,$ 9, permutation tests for $z_k$ 0 against $z_k$ 1 and $z_k$ 2 produce approximate p-values $z_k$ 3 and $z_k$ 4, and robustness analysis indicates that the decision to retain $z_k$ 5 remains significant up to roughly $z_k$ 6 contamination (Jansen et al., 5 Dec 2025).

Beyond protocol-based EDPs, several neighboring literatures instantiate empirical decision making in more structured domains. In finite statistical decision problems with a finite family $z_k$ 7 of decision rules and risk $z_k$ 8, the minimax problem

$z_k$ 9

is a convex program over the simplex, and mirror subgradient descent with negative-entropy mirror map—equivalently, the Hedge algorithm—returns an $a_k\in\mathcal A$ 0-minimax mixed rule in $a_k\in\mathcal A$ 1 oracle calls (Fernández et al., 9 Sep 2025).

In compound decision problems and empirical Bayes, one observes $a_k\in\mathcal A$ 2 and evaluates separable rules through the compound risk

$a_k\in\mathcal A$ 3

The empirical-Bayes program estimates the unknown mixing distribution $a_k\in\mathcal A$ 4 by NPMLE and plugs the estimate into the Bayes rule; in the Kiefer–Wolfowitz formulation, the NPMLE can be taken to be discrete with at most $a_k\in\mathcal A$ 5 atoms (Koenker et al., 2024). Related regret analyses show sharp lower bounds for normal and Poisson models and establish, for example, that in the Poisson case the optimal regret scales as $a_k\in\mathcal A$ 6 for compact support and $a_k\in\mathcal A$ 7 for subexponential priors, both attained by Robbins’s estimator (Polyanskiy et al., 2021).

In dynamic settings, empirical dynamic programming replaces exact expectations in the Bellman operator by sample averages. For a discounted MDP, the empirical Bellman operator

$a_k\in\mathcal A$ 8

defines empirical value iteration and empirical policy iteration, and convergence is analyzed through probabilistic fixed points and a stochastic-dominance argument on discretized error bins (Haskell et al., 2013). In logic-based settings, probabilistic answer set programming under credal semantics encodes decision atoms, probabilistic facts, and utility attributes, and Azzolini et al. solve for maximin and maximax strategies via a three-layer Algebraic Model Counting construction compiled into an X/D-first sd-DNNF circuit (Azzolini et al., 2024).

6. Limits, misconceptions, and open directions

A common misconception is that abandoning explicit states removes the need for probabilistic assumptions. The current EDP theory does not support that claim. Its inferential guarantees rely on randomness conditions such as exchangeability or i.i.d. sampling; truly adversarial or deterministic protocols lie outside the scope of the present results, and learning dependencies beyond small-contamination models remains an open research direction (Jansen et al., 5 Dec 2025).

A second misconception is that larger datasets or more powerful ML systems automatically dissolve empirical ambiguity. The missing-data literature makes the opposite point: identification problems may not diminish as sample size grows, because the identified set $a_k\in\mathcal A$ 9 can remain large even with infinite data. In that setting, standard ML defaults such as case deletion, simple imputation, and model-based imputation implicitly rely on MCAR- or MAR-type assumptions rather than resolving the underlying identification problem (Dominitz et al., 15 Sep 2025). This suggests that EDPs should be viewed not as a rejection of rigor, but as part of a broader effort to align decision analysis with what is actually observable and inferentially defensible.

The remaining open directions are already explicit in the current literature. For EDPs, extending direct-inference empirical choice functions to more general forms of imprecision, including belief functions and intervals, is presented as a promising avenue; for adjacent empirical decision frameworks, open issues include handling non-i.i.d. dependence, sharpening constants and computational guarantees, and integrating robust decision criteria with richer empirical protocols (Jansen et al., 5 Dec 2025). The unifying theme is that empirical decision problems replace idealized model primitives with observable structure, while still demanding formal guarantees about consistency, testing, and robustness.