Post-hoc Learned Deferral Rule

Updated 4 December 2025

Post-hoc learned deferral rule is an adaptive decision mechanism that uses outputs from fixed pre-trained models along with auxiliary features to determine whether to predict automatically or defer to an expert.
It leverages post-processing techniques—such as threshold selection, surrogate loss functions, and Lagrange multiplier optimization—to satisfy constraints like fairness, cost, and accuracy.
Empirical evaluations across classification, sequence prediction, and language model cascades demonstrate significant performance gains and enhanced operational tradeoffs.

A post-hoc learned deferral rule is an adaptive decision mechanism, learned after pre-training, that determines whether a prediction should be made by an automated system or deferred to a secondary system (such as a human expert, a larger model, or a specialized model), based on features derived from the automated system's outputs and auxiliary data. Unlike integrated deferral strategies requiring joint training, post-hoc rules leverage fixed pre-trained models and construct deferral decisions as a post-processing layer, optimizing for downstream metrics such as accuracy, risk, fairness, or compliance with operational constraints. Recent developments in this area address challenges across classification, sequence prediction, and LLM cascades, offering both theoretical guarantees and practical algorithms.

1. Formal Foundation and Motivation

The post-hoc learned deferral rule arises from the need to optimally combine automated systems (machines) with experts by selectively deferring predictions based on uncertainty, expected improvement, or operational constraints. In the general Learn-to-Defer (L2D) paradigm, one observes input $X \in \mathcal{X}$ , true label $Y \in \{1, \ldots, L\}$ , and an expert's decision $M$ ; a base classifier $h$ predicts from $X$ , with its own loss $\ell_{\rm AI}$ , while the expert incurs loss $\ell_{\rm H}$ . The binary deferral indicator $r(X) \in \{0, 1\}$ determines whether the prediction comes from the automated model ( $r=0$ ) or is deferred to the expert ( $r=1$ ). The objective is to minimize the average deferral loss under potentially multiple constraints, formalized as:

$(P)\quad \max_{f: \mathcal{X} \to \Delta_{L+1}} \mathbb{E}[\langle f(x), \psi_0(x) \rangle] \quad\text{s.t.}\quad \mathbb{E}[\langle f(x), \psi_i(x)\rangle] \leq \delta_i,~ i=1,\dots,m,$

where $\psi_0(x)$ encodes negative deferral-loss and $\psi_i(x)$ encodes constraints such as expert budget or fairness (Charusaie et al., 17 Jul 2024).

In cascades of machine learning models, the post-hoc deferral rule further adapts by routing instances or sequence components among multiple models of varying capacity, exploiting predicted uncertainty, label noise sensitivity, or distributional shifts (Jitkrittum et al., 2023, Gupta et al., 15 Apr 2024, Rayan et al., 3 Feb 2025, Hammal et al., 30 Oct 2025).

2. Theoretical Basis: Bayes-Optimality and Surrogate Learning

The optimal deferral rule, under oracle knowledge, is typically characterized by comparing the improvement in expected loss from deferral to the deferral cost. For a two-model cascade, let $h^{(1)}$ and $h^{(2)}$ be base and deferred models, and $\eta_k(x) = P(y = h^{(k)}(x) | x)$ . The Bayes-optimal deferral rule is:

$r^*(x) = 1[\Delta\eta(x) > c],$

where $\Delta\eta(x) = \eta_2(x) - \eta_1(x)$ and $c$ is the cost of deferring (Jitkrittum et al., 2023).

Generalizing to constrained, multi-objective settings, the $d$ -GNP lemma (a $d$ -dimensional extension of Neyman–Pearson) provides the closed-form for the randomized optimal policy. There exist Lagrange multipliers $\{k_i\}_{i=1}^m$ such that the acceptance/defer regions are maximizers of linear scores

$S_i(x) = \psi_{0,i}(x) - \sum_{j=1}^m k_j \psi_{j,i}(x)$

over the possible predictions and the deferral action (Charusaie et al., 17 Jul 2024). The full problem reduces to thresholding or linear programming in the space of outputs.

For sequence prediction, the post-hoc rule extends to selecting, for each output position $j$ , whether to defer, constructing token-level or one-time deferral points through cost-sensitive multiclass classification. Surrogate loss functions are constructed to be Bayes-consistent, and finite-sample guarantees are established (Rayan et al., 3 Feb 2025).

3. Algorithmic Implementation and Workflow

Post-hoc learned deferral rules typically follow these stages:

Feature Construction: For classification, features $\phi(x)$ might include model confidences (e.g., softmax probabilities, entropy), uncertainty quantiles, or auxiliary classifier scores. For generative models, rich features include token-level uncertainties, aggregate uncertainty summaries (sum, average, quantiles), and optionally network embeddings from small or large models (Gupta et al., 15 Apr 2024).
Surrogate Label Definition: For validation data, construct proxy targets approximating whether deferral would be beneficial, e.g., by oracle differences in accuracy ("Diff-01") or probability ("Diff-Prob") (Jitkrittum et al., 2023).
Learning the Deferral Rule: Train a regressor or classifier (commonly a shallow MLP) to predict the surrogate label from features. The empirical risk is minimized subject to regularization (typically $L_2$ or early stopping).
Threshold Selection: Select a deferral threshold on a validation set to meet constraint(s), such as average cost or fairness (Gupta et al., 15 Apr 2024, Charusaie et al., 17 Jul 2024). For multi-objective settings, grid or coordinate search over Lagrange multipliers is performed.
Test-Time Routing: At inference, compute features for each sample, apply the learned rule, and route accordingly. In sequence models, deferral is at token or sequence-point granularity, interleaving model and expert predictions (Rayan et al., 3 Feb 2025).

Pseudocode for token-level sequence deferral is:

input x
y_hat = []
for j in 1...L:
    score = r_j(x, y_hat)
    if score < tau:
        next_token = f(x, y_hat)
    else:
        next_token = e(x, y_hat)
    y_hat.append(next_token)
return y_hat

(Rayan et al., 3 Feb 2025)

4. Extensions: Token-Level, Sequence, and Knapsack Formulations

Recent works extend post-hoc learned deferral rules far beyond single prediction settings:

Token-Level and Partial Sequence Deferral: Sequence models may benefit from token-level rejectors ( $r_1,...,r_L$ ), which decide on deferral individually for each position, or one-time rejectors that select when to hand off remaining prediction to the expert. These structures enable super-linear improvements in cost-accuracy tradeoff compared to whole-sequence deferral (Rayan et al., 3 Feb 2025).
Knapsack Approximations in Autoregressive LMs: The Kad framework casts token-wise deferral as a 0–1 knapsack problem, balancing per-token base model risk against a total deferral budget. The dual (thresholding) and primal (critical-index) approximations yield closed-form, computationally efficient rules that can be tuned for arbitrary deferral budgets and are compatible with speculative decoding (Hammal et al., 30 Oct 2025).
Fusion with Deep Embeddings and Uncertainty: Incorporating model-internal embeddings and more granular token-level uncertainty summaries further boosts performance in large-scale LM cascades. Fusing intermediate layer representations enables additional accuracy-computational cost improvements (Gupta et al., 15 Apr 2024).

5. Empirical Evaluation and Key Results

Empirical validation across application domains demonstrates consistent improvements:

Domain	Baseline(s)	Post-hoc Gain	Cited Paper
Classification	Confidence threshold, prior deferral	+2–3% accuracies, lower parity/fairness violation	(Jitkrittum et al., 2023, Charusaie et al., 17 Jul 2024)
Seq. prediction	Whole-deferral, Chow's rule	Up to +7.6% better AUDC in summarization	(Rayan et al., 3 Feb 2025)
LM Cascades	Chow-Sum/Average, fixed quantile	+5–20% AUC-DF (area under deferral curve)	(Gupta et al., 15 Apr 2024)
LLM Test-time	Token-level nudging	+3–4% accuracy, faster speculative decoding	(Hammal et al., 30 Oct 2025)

Notable findings include:

Partial deferral (token-level, or budgeted knapsack) yields strictly better loss-per-expert-cost compared to whole-sequence or confidence-only thresholds.
For fairness and budget-constrained classification, post-hoc rules attain the exact constraint boundary with minimal accuracy loss and outperform regularized in-processing and prior post-shift baselines (Charusaie et al., 17 Jul 2024).
In LM cascades, combining token-level uncertainty quantiles with network embeddings improves AUC-DF by up to 20% over sum/average-based approaches (Gupta et al., 15 Apr 2024).
Knapsack-based rules (Kad) maintain strict adherence to deferral budgets and outperform both na\"ive token-wise rules and prior nudging methods in both alignment performance and speculative decoding throughput (Hammal et al., 30 Oct 2025).

6. Generalization Guarantees and Practical Considerations

Theoretical properties of post-hoc deferral rules depend on the chosen surrogate loss and the richness of training features. Results establish Bayes-consistency of the surrogate, finite-sample excess risk bounds (via Rademacher complexity) for the learned rejector, and explicit regret bounds in the knapsack case proportional to the loss of the critical index (Rayan et al., 3 Feb 2025, Hammal et al., 30 Oct 2025). Overfitting is mitigated by keeping the rejector architecture shallow and applying regularization.

Practical deployment notes include:

Embedding extraction from base models incurs negligible additional cost and can be flexibly fused at routing time.
Training uses moderate-sized validation sets, typically a few thousand examples, and the rejector can be tuned for cost, accuracy, or fairness constraints as required.
Block-wise, multi-expert, and context-dependent extensions are feasible depending on expert capabilities (Rayan et al., 3 Feb 2025).

7. Connections and Limitations

Post-hoc learned deferral rules unify and extend a wide range of cascaded, cost-sensitive, and fairness-constrained decision architectures. They generalize confidence-based thresholds by learning direct mappings from uncertainty and auxiliary features to optimal deferral decisions.

Key limitations include:

The need for held-out data to calibrate deferral thresholds, Lagrange multipliers, or surrogate losses.
Slight degradation relative to fully retrained, joint fine-tuning baselines in some settings (noted for full LLM alignment (Hammal et al., 30 Oct 2025)).
For large-scale or high-stakes applications, the provision and calibration of accurate surrogate deferral costs remain non-trivial.

Future directions highlighted in recent work include adaptive context-specific budgets, multi-way and hierarchical cascades, and further theoretical analysis of regret and fairness tradeoffs in non-i.i.d. environments.