Protocol-Based Empirical Choice Functions

Updated 4 July 2026

The paper introduces a framework that converts observed decision protocols into set-valued empirical choice functions using structural restrictions like coherence and Pareto rationalization.
It employs diverse inferential frameworks—including closed-form restrictions, natural extensions, and latent utility modeling—to ensure robust statistical estimation and testing.
The work has practical implications, enabling counterfactual prediction, welfare analysis, and efficient empirical set construction from real-world decision data.

Searching arXiv for recent and relevant papers on protocol-based and empirical choice functions. Protocol-Based Empirical Choice Functions designate a family of empirical and decision-theoretic constructions in which the primitive input is not a fully specified utility function on a known state space, but an observed protocol: menu choices, budget choices, smart-card route realizations, search-tree pruning rules, or finite families of act–consequence pairs. The common objective is to map such protocols into a choice function, choice correspondence, or empirically disciplined dominance relation, typically under explicit structural restrictions such as utility maximization, coherence, Pareto rationalization, path independence, or revealed preference (Jansen et al., 5 Dec 2025). The label is used explicitly in “Empirical Decision Theory” (Jansen et al., 5 Dec 2025), and several earlier works are naturally interpreted in the same vein: binary demand under varying price–income protocols (Bhattacharya, 2019), coherent extension of partial menu choices (Decadt et al., 2024), empirical route-choice set construction from smart-card protocols (Sfeir et al., 11 Mar 2025), and safe action-pruning rules in online search (Issakkimuthu et al., 2019).

1. Conceptual scope

In the most explicit formulation, an empirical decision problem is a pair $(\mathbb A,\mathscr P_{\mathbf z})$ , where $\mathbb A=\{a_1,\dots,a_n\}$ is a context-consistent finite set of action descriptions and $\mathscr P_{\mathbf z}$ is a $\mathbf z$ -protocol: a finite family of act–consequence pairs $((q_j,p_j))_{j\in\underline N}$ such that each action description $a_i$ appears exactly $z_i$ times (Jansen et al., 5 Dec 2025). This formulation replaces the classical primitive $X:S\to C$ by observable records of what happened when an action description was used. The paper’s stated objective is to address optimality “in a radically empirical way” and to derive inferential guarantees through “(I) consistent statistical estimation of choice sets, (II) consistent statistical testing of choice functions with robustness guarantees, and (III) direct inference for empirical choice functions using credal sets” (Jansen et al., 5 Dec 2025).

The same conceptual move appears, in narrower forms, across several adjacent literatures. In binary demand, the operative protocol is the budget environment $(p,y)$ , equivalently the pair of leftovers $(y,y-p)$ , and the empirical object is the structural choice probability $\mathbb A=\{a_1,\dots,a_n\}$ 0 or $\mathbb A=\{a_1,\dots,a_n\}$ 1 (Bhattacharya, 2019). In coherent choice-function theory, the primitive data are observed menus together with chosen and rejected options, encoded as an assessment $\mathbb A=\{a_1,\dots,a_n\}$ 2 (Decadt et al., 2024). In public transport route choice, the protocol is a stream of tap-in, transfer, and tap-out events from smart-card data, from which distinct observed paths are aggregated into empirical route sets (Sfeir et al., 11 Mar 2025). In online policy improvement, a choice function $\mathbb A=\{a_1,\dots,a_n\}$ 3 specifies which actions are admissible at each search-tree node, so the protocol is the pruning rule itself rather than an ex post dataset (Issakkimuthu et al., 2019).

A unifying feature is that the output is usually set-valued. The aim is not always to infer a unique optimal item. Depending on the framework, the output can be a set of non-rejected menu options, a set of undominated bundles, a set of feasible routes attached to an origin–destination context, or a set of search actions retained by a pruning protocol (Decadt et al., 2024). This suggests that protocol-based empirical choice functions are best understood as a general family of methods for converting observed or imposed decision protocols into disciplined, usually incomplete, empirical choice correspondences.

2. Core formal objects

A recurring primitive is the menu-based choice function. In the coherence-based literature, menus are finite nonempty subsets $\mathbb A=\{a_1,\dots,a_n\}$ 4 of an option space $\mathbb A=\{a_1,\dots,a_n\}$ 5, and a choice function is a map $\mathbb A=\{a_1,\dots,a_n\}$ 6 such that $\mathbb A=\{a_1,\dots,a_n\}$ 7 for all $\mathbb A=\{a_1,\dots,a_n\}$ 8 (Decadt et al., 2024). The associated rejection function is $\mathbb A=\{a_1,\dots,a_n\}$ 9, and the intended semantics are that $\mathscr P_{\mathbf z}$ 0 means no option in $\mathscr P_{\mathbf z}$ 1 is known to be strictly preferred to $\mathscr P_{\mathbf z}$ 2, whereas $\mathscr P_{\mathbf z}$ 3 means some option in $\mathscr P_{\mathbf z}$ 4 is strictly preferred to it (Decadt et al., 2024). The 2020 practical inference paper adopts the same formal architecture, with menus $\mathscr P_{\mathbf z}$ 5, rejection $\mathscr P_{\mathbf z}$ 6, and coherence axioms $\mathscr P_{\mathbf z}$ 7– $\mathscr P_{\mathbf z}$ 8 (Decadt et al., 2020).

In stochastic environments, the basic object is a stochastic choice function $\mathscr P_{\mathbf z}$ 9, together with the threshold-induced deterministic correspondence

$\mathbf z$ 0

This “Fishburn family” converts repeated stochastic choice into a one-parameter family of deterministic empirical choice filters (Ok et al., 2023). Rationality is then assessed not directly on $\mathbf z$ 1, but on the deterministic correspondences $\mathbf z$ 2 via Chernoff, Condorcet, and No-Cycle conditions (Ok et al., 2023).

Binary-demand analysis works with a different but closely related empirical object. An individual chooses $\mathbf z$ 3 under budget constraint

$\mathbf z$ 4

and heterogeneity induces a structural population choice probability

$\mathbf z$ 5

Rationalizability means the existence of utility functions $\mathbf z$ 6, $\mathbf z$ 7 and a distribution $\mathbf z$ 8 such that

$\mathbf z$ 9

with monotonicity in leftover numeraire at the individual level (Bhattacharya, 2019). Here the empirical choice function is the full conditional probability surface $((q_j,p_j))_{j\in\underline N}$ 0.

A further variant is the latent Pareto-rationalized choice function used in Gaussian-process learning and multi-objective Bayesian optimization. There, observed data are menu-choice pairs $((q_j,p_j))_{j\in\underline N}$ 1, and the model assumes a latent vector function $((q_j,p_j))_{j\in\underline N}$ 2 such that the observed choice set equals the Pareto set of nondominated menu items (Benavoli et al., 2021). The related Gaussian-process learning paper adopts the same multiple-utility representation, writing a choice function $((q_j,p_j))_{j\in\underline N}$ 3 with $((q_j,p_j))_{j\in\underline N}$ 4 and interpreting the selected subset as the set of strongly Pareto-undominated options (Benavoli et al., 2023).

3. Major protocol types in the literature

The literature grouped under this label spans several distinct protocol classes.

Protocol	Empirical object	Representative source
Budget protocol $((q_j,p_j))_{j\in\underline N}$ 5	Binary choice probability $((q_j,p_j))_{j\in\underline N}$ 6	(Bhattacharya, 2019)
Menu-choice assessment $((q_j,p_j))_{j\in\underline N}$ 7	Natural extension $((q_j,p_j))_{j\in\underline N}$ 8	(Decadt et al., 2024)
Smart-card trip protocol	Empirical route set $((q_j,p_j))_{j\in\underline N}$ 9	(Sfeir et al., 11 Mar 2025)
Search-tree pruning protocol	Action choice function $a_i$ 0	(Issakkimuthu et al., 2019)
Act–consequence protocol $a_i$ 1	Empirical choice sets and tests	(Jansen et al., 5 Dec 2025)

The budget protocol of binary demand is unusually clean because the protocol variables are exactly the economically relevant leftovers under the two alternatives. The resulting restrictions are “global,” “closed-form,” and invariant to the number or configuration of observed budgets (Bhattacharya, 2019). In assessment-based choice theory, by contrast, the protocol is finite and menu-specific: one observes chosen subsets and rejected alternatives, encodes each observed rejection as a difference set, and then computes the natural extension consistent with coherence (Decadt et al., 2024).

Smart-card route choice supplies an observational protocol rather than an elicitation protocol. In the Danish Rejsekort system, passengers must “tap-in at the start of the journey,” “tap-in and tap-out at each transfer,” and “tap-out at the end,” so observed trips can be reconstructed into route alternatives for each stop-to-stop origin–destination pair (Sfeir et al., 11 Mar 2025). Search-based reinforcement learning occupies another extreme: the protocol is the admissible-action rule at each search-tree node, formalized by a choice function $a_i$ 2, and the main question is when that protocol guarantees online policy improvement relative to a base policy $a_i$ 3 (Issakkimuthu et al., 2019).

Taken together, these cases show that “protocol” may denote either an observation design, an elicitation design, an institutional choice architecture, or a computational pruning rule. The common element is that the empirical choice mapping is defined only relative to that protocol.

4. Main inferential frameworks

One major framework derives closed-form restrictions directly from the protocol. In binary choice, rationalizability of $a_i$ 4 under completely general unobserved heterogeneity is equivalent to monotonicity in $a_i$ 5 and $a_i$ 6, and under differentiability to the pair of Slutsky-like inequalities

$a_i$ 7

These restrictions are both necessary and sufficient, are global rather than dataset-specific, and yield sharp counterfactual bounds and welfare bounds (Bhattacharya, 2019).

A second framework computes the natural extension of partial menu data. For a choice assessment $a_i$ 8, consistency means that there exists at least one coherent choice function extending the assessment, and when consistency obtains, the natural extension is

$a_i$ 9

where $z_i$ 0 is the set of preference orders compatible with the assessment (Decadt et al., 2024). The operational criterion is that $z_i$ 1 iff there exists a generator set $z_i$ 2 such that $z_i$ 3, with $z_i$ 4 (Decadt et al., 2024). The earlier practical paper gives the same conservative logic at the level of coherence axioms and sets of desirable option sets, emphasizing that the method infers exactly those rejections forced by coherence and the observed protocol (Decadt et al., 2020).

A third framework learns a latent context-dependent or multi-utility representation from empirical menus. “Learning Context-Dependent Choice Functions” models a choice function $z_i$ 5 through latent context-dependent utility $z_i$ 6, with deterministic singleton choice

$z_i$ 7

and deterministic subset choice

$z_i$ 8

Its two main representations are FETA,

$z_i$ 9

and FATE,

$X:S\to C$ 0

which provide permutation invariance and support variable menu size (Pfannschmidt et al., 2019). The Gaussian-process alternatives replace deterministic networks by GP priors on latent utilities and derive likelihoods for menu/subset observations under Pareto rationalization (Benavoli et al., 2023, Benavoli et al., 2021).

A fourth framework evaluates stochastic protocols by thresholded deterministic approximations. For a stochastic choice function $X:S\to C$ 1, the Fishburn family $X:S\to C$ 2 induces a partial ordering $X:S\to C$ 3 over stochastic choice functions, and the parameter-free characterization is

$X:S\to C$ 4

where $X:S\to C$ 5 is the set of threshold values at which $X:S\to C$ 6 fails rationality (Ok et al., 2023). This directly measures how robust revealed-preference coherence is to the thresholding convention used to extract deterministic empirical choice correspondences.

5. Structural restrictions, coherence, and procedural composition

A large part of the literature studies which structural restrictions on a protocol preserve rationality-like properties. In the coherence approach of De Bock and De Cooman, a coherent choice function satisfies $X:S\to C$ 7– $X:S\to C$ 8: nonemptiness, translation invariance, acceptance of uniformly positive options via $X:S\to C$ 9, positive combination closure, and monotone rejection under menu expansion (Decadt et al., 2020). The 2024 natural-extension paper reformulates coherence through sets of preference orders and coherent desirable-option sets, using the bijection $(p,y)$ 0 between preference orders and coherent cones of desirable differences (Decadt et al., 2024).

Procedural composition introduces another layer. “Lexicographic Composition of Choice Functions” studies

$(p,y)$ 1

where $(p,y)$ 2 is an exclusion rule that maps prior selections into later-stage restrictions (Horan et al., 2022). Its main characterization is that $(p,y)$ 3 preserves path independence over responsive choice functions if and only if $(p,y)$ 4 is threshold-linear with cardinal reuse. In that form, there exist $(p,y)$ 5, $(p,y)$ 6, and nested sets $(p,y)$ 7 such that for $(p,y)$ 8,

$(p,y)$ 9

while at or above threshold the gross exclusion shuts down later incremental choice (Horan et al., 2022). The paper’s substantive message is that rational aggregate behavior can arise from sequential procedures only under sharply constrained state-update rules.

A related structural theorem appears in the poset literature. A conservative choice function on a poset,

$(y,y-p)$ 0

satisfying heredity and outcast can be represented as the union of elementary choice functions generated by well-ordered sequences, and in the finite case by antichain sequences (Danilov, 2021). Each elementary rule is a sequential protocol: scan ordered targets, stop at the first feasible trigger, and select the feasible lower closure generated up to that trigger (Danilov, 2021).

Search-based decision making supplies still another procedural criterion. In online policy improvement, a choice function $(y,y-p)$ 1 specifies which actions remain admissible at each search-tree node. The central sufficient conditions for safety are $(y,y-p)$ 2-consistency,

$(y,y-p)$ 3

and monotonicity,

$(y,y-p)$ 4

which prevent a branch from appearing good only because it relies on future actions that later search trees would prune away (Issakkimuthu et al., 2019). This is a protocol-level rationality theorem for sequential action pruning.

6. Counterfactual prediction, welfare, and empirical applications

Protocol-based empirical choice functions are often valued for their counterfactual content. In binary demand, the shape restrictions yield sharp bounds on unobserved choice probabilities. For a counterfactual $(y,y-p)$ 5, the lower and upper bounds $(y,y-p)$ 6 and $(y,y-p)$ 7 are defined directly from observed budgets satisfying the relevant monotonicity orderings, and every value in $(y,y-p)$ 8 can be embedded in a globally rationalizable choice-probability function (Bhattacharya, 2019). The same structure yields sharp welfare bounds for average compensating variation by integrating lower and upper envelopes along a counterfactual price path (Bhattacharya, 2019).

The welfare literature extends this logic from single-consumer prediction to social evaluation. “Empirical Welfare Economics” constructs an incomplete relation in which one bundle bests another if it is a convex combination of bundles each indirectly revealed preferred to the latter, and an allocation empirically dominates another if all agents weakly best their original bundles and at least one strictly bests (Chambers et al., 2021). Its main theorem states that a candidate allocation is Pareto efficient for some increasing concave rationalizing utility profile if and only if no other allocation empirically dominates it (Chambers et al., 2021). A plausible implication is that protocol-based empirical choice functions can serve as welfare-relevant substitutes for latent utilities when the aim is certification of possible efficiency rather than recovery of a complete social ordering.

Transport applications illustrate the same logic in a high-dimensional observational setting. The empirical route-choice set $(y,y-p)$ 9 is formed by pooling distinct smart-card-observed paths for an origin–destination pair over an observation window $\mathbb A=\{a_1,\dots,a_n\}$ 00 (Sfeir et al., 11 Mar 2025). The approach is computationally attractive—empirical choice-set generation using observed travel times for 20 days took less than 5 minutes, whereas the conventional generated-set procedure required 2 weeks on a 2.6 GHz CPU, 196 GB RAM Linux machine—but it is sample-dependent: with one day of data, the empirical stop-to-stop set had 66.36% of OD pairs with only one alternative and 1.61 alternatives per OD on average, whereas after 20 weekdays the average rose to 6.35 and no plateau effect was observed (Sfeir et al., 11 Mar 2025). This application shows protocol-based empirical choice functions functioning as empirical feasible-set definitions rather than as preference estimators.

Empirical-decision-theoretic work makes the protocol role explicit at the foundation. Its proposed inferential agenda includes estimation, testing, and direct credal-set inference for choice sets derived from act–consequence protocols, and its proof-of-concept application compares prompting strategies in generative AI models (Jansen et al., 5 Dec 2025). This suggests a significant broadening of the topic beyond consumer and transportation choice.

7. Limitations, controversies, and open directions

Several limitations recur across the literature. First, many frameworks are exact rather than stochastic. The coherence-based natural-extension methods treat observations as hard constraints; if the assessment is inconsistent, “failure of extension” occurs rather than a best-fit correction (Decadt et al., 2020, Decadt et al., 2024). This limits direct applicability to noisy behavioral datasets unless a separate statistical layer is added.

Second, the structural assumptions can be narrow. The binary-demand characterization is specific to binary choice, and the paper states that it “does not provide analogous closed-form conditions for multinomial discrete choice” (Bhattacharya, 2019). The GP-based Pareto models assume that observed menu choices are rationalizable as Pareto sets of a latent vector function, which excludes arbitrary context effects or non-Pareto menu phenomena (Benavoli et al., 2023, Benavoli et al., 2021). Context-dependent neural models address broader menu effects, but they do not recover an explicit symbolic protocol; the paper notes that they provide context-sensitive scoring functions rather than procedural rules (Pfannschmidt et al., 2019).

Third, some protocol effects are inherently destabilizing. Lexicographic composition shows that not every sequential procedure preserves path independence; only threshold-linear exclusion with cardinal reuse does so over responsive inputs, and the admissible class becomes even smaller over all path-independent choice functions (Horan et al., 2022). The stochastic-rationality literature likewise distinguishes revealed-preference coherence from random-utility representability: a stochastic choice function can be a random utility model and still fail threshold-based stochastic transitivity, so “is a RUM” is not equivalent to maximal rationality in that framework (Ok et al., 2023).

Fourth, empirical protocol definitions can be incomplete or sample-dependent. Route-choice construction from observed smart-card paths omits unchosen but feasible alternatives, especially over short windows, so the empirical choice set is only an observational approximation to the feasible set (Sfeir et al., 11 Mar 2025). The decision-theoretic framework of act–consequence protocols avoids explicit state spaces, but still assumes that each action description has an associated latent act $\mathbb A=\{a_1,\dots,a_n\}$ 01 and that the action-description set can be context consistent (Jansen et al., 5 Dec 2025).

The main open direction is therefore not a single unresolved theorem but an overview problem. The recent literature points toward a general theory in which protocols can be observational, elicited, computational, or institutional; choice objects can be deterministic, stochastic, or set-valued; and inferential guarantees can be logical, revealed-preference-based, statistical, or robust. The data strongly support the view that protocol-based empirical choice functions are not one model class but a unifying research program for deriving disciplined empirical choice mappings directly from the way decision evidence is generated (Jansen et al., 5 Dec 2025).