Conformal Deferral Rule: Hybrid Decision Making

Updated 30 November 2025

Conformal Deferral Rule is a training-free, model-agnostic framework that uses conformal prediction to quantify uncertainty and automatically defer ambiguous cases.
It employs a segregativity criterion to select the expert with the highest empirical accuracy on the defined prediction set, ensuring effective expert routing.
The method guarantees specified error rates through marginal coverage and has demonstrated high predictive accuracy along with significant expert workload reduction on benchmark datasets.

The conformal deferral rule is a training-free, model- and expert-agnostic framework for orchestrating deferral from AI predictors to multiple experts, leveraging conformal prediction to define quantifiable uncertainty and expert selection via a segregativity criterion. This methodology enables hybrid human-AI decision-making, automatically identifying instances where predictions are ambiguous and routing them to the most discriminative expert, while guaranteeing specified error rates through marginal coverage properties. The conformal deferral rule enables workload reduction for experts and achieves high predictive accuracy without retraining in the presence of changes to expert composition (Bary et al., 16 Sep 2025).

1. Formal Specification of Conformal Prediction and Prediction Sets

The conformal deferral rule operates on a feature space $X$ , a label space $Y = \{1, ..., m\}$ , and a pre-trained probabilistic classifier $\phi:X\to\Delta^{m-1}$ providing a label probability estimate $\phi_y(x)$ for each $y\in Y$ . Calibration is performed using an i.i.d. set $D_{\text{cal}} = \{(x_i, y_i)\}_{i=1}^n$ exchangeable with test points.

A nonconformity score $s:X\times Y\to\mathbb{R}$ measures the discordance between input $x$ and label $y$ . Common examples include:

LAC (Least Ambiguous Classifier): $s_{\text{LAC}}(x, y) = -\phi_y(x)$ ,
APS (Aggregated Probability Score): $s_{\text{APS}}(x, y) = \sum_{y':\phi_{y'}(x)\geq\phi_y(x)} \phi_{y'}(x)$ ,
RAPS (Regularized Adaptive Prediction Set): an APS variant with a penalty term.

Calibration proceeds by computing scores $r_i = s(x_i, y_i)$ . For user-specified $\alpha\in(0,1)$ , $t_\alpha$ is the smallest threshold such that $\#\{i: r_i \leq t_\alpha\} \geq \lceil (1-\alpha)(n+1)\rceil$ , yielding a prediction set $C_\alpha(x) = \{y\in Y: s(x, y)\leq t_\alpha\}$ . Vovk et al. (2005) showed that, under exchangeability, $P_{(x, y)}[y \in C_\alpha(x)] \geq 1-\alpha$ (Bary et al., 16 Sep 2025).

2. Deferral Mechanism and Decision Rule

Prediction proceeds as follows: the classifier accepts responsibility and outputs $\hat{y}(x) = \arg\max_{y\in C_\alpha(x)} \phi_y(x)$ if $|C_\alpha(x)| = 1$ , and otherwise defers to an external expert. Deferral is thus governed by

$\delta(x) = \begin{cases} 1 & \text{if } |C_\alpha(x)| = 1 \ 0 & \text{otherwise} \end{cases}$

where $\delta(x)=1$ indicates autonomous model prediction, and $\delta(x)=0$ triggers deferral (Bary et al., 16 Sep 2025).

3. Segregativity Criterion for Expert Selection

For $K$ experts, each $k$ has a history $Y_k = \{(\hat{y}_{k,j}, y_j)\}_{j=1}^{N_k}$ comprising past predictions $\hat{y}_{k,j}$ and true labels $y_j$ . When $\delta(x)=0$ and $C_\alpha(x)=\Gamma$ , define

$\check{Y}_k(\Gamma) = \{\,(\hat{y}_{k,j}, y_j)\in Y_k:\, \hat{y}_{k,j}\in\Gamma,\, y_j\in\Gamma\,\}$

and compute the segregativity

$\sigma_k(\Gamma) = \frac{1}{|\check{Y}_k(\Gamma)|} \sum_{(\hat{y}, y)\in\check{Y}_k(\Gamma)} \mathbf{1}_{\hat{y} = y}$

The expert selected for the deferred sample is $k^*(x) = \arg\max_{k=1,\ldots, K} \sigma_k(C_\alpha(x))$ . In case of ties, random selection or cost-aware heuristics may be employed; if $C_\alpha(x)=\emptyset$ , the expert with highest overall accuracy is chosen (Bary et al., 16 Sep 2025).

4. Deferral Procedure: Algorithmic Description

The conformal-segregativity deferral algorithm receives $x\in X$ , classifier $\phi$ , nonconformity score $s$ , threshold $t_\alpha$ , and expert histories $Y_1,\ldots,Y_K$ as input, and outputs a final label:

Construct prediction set $\Gamma = \{y\in Y: s(x,y)\leq t_\alpha\}$ .
If $|\Gamma| = 1$ , output the unique element of $\Gamma$ .
Else, for each expert $k$ , compute restricted set $\check{Y}_k$ and segregativity $\sigma_k$ .
Choose $k^* = \arg\max_k \sigma_k$ .
Query expert $k^*$ and return its label.

This method is training-free and does not require model retraining when the expert pool changes (Bary et al., 16 Sep 2025).

5. Theoretical Guarantees and Performance Metrics

Marginal coverage is guaranteed under exchangeability: $P[y\in C_\alpha(x)] \geq 1-\alpha$ . The probability the model errs is $P[\text{model-error}] \leq \alpha$ . The deferral-rate $\Delta(\alpha) = P[|C_\alpha(x)| > 1]$ quantifies the expected proportion of samples deferred to experts. The expected expert workload reduction factor is thus $R(\alpha) = 1/\Delta(\alpha)$ . Overall error decomposes as

$P[\text{system-error}] = e_{\text{model}} + e_{\text{expert}} \leq \alpha + (1 - P[|C_\alpha(x)|=1]) \cdot \max_k (\text{expert}_k \text{ error on their region})$

This facilitates explicit trade-off between model autonomy and error rate by tuning $\alpha$ ; segregativity routing ensures ambiguous points are directed to the expert with highest empirical discriminative ability on the region defined by $C_\alpha(x)$ (Bary et al., 16 Sep 2025).

6. Empirical Performance and Robustness Properties

On datasets such as CIFAR10-H and ImageNet16-H, the conformal deferral rule achieves accuracy of $99.57\pm0.10\%$ and $99.40\pm0.52\%$ , outperforming both standalone models and the top individual expert. The expected expert workload reduction factor can reach up to $11$, indicating substantial efficiency. Performance remains robust even under degraded expert accuracy, with observed error rates showing a gradual decline in low-information settings (i.e., as prediction set ambiguity grows) (Bary et al., 16 Sep 2025).

A plausible implication is that this framework provides a scalable and practical alternative to retraining-intensive learning-to-defer methods in real-world human-AI collaborative settings.

7. Contextual Significance and Limitations

Unlike classical Learning to Defer (L2D) techniques which require retraining upon changes in the expert pool, the conformal deferral rule is inherently adaptation- and retraining-free, retaining the coverage and error guarantees for all test distributions exchangeable with the calibration set. This suggests particular applicability for dynamic expert systems and mission-critical domains where both trust calibration and expert labor optimization are prioritized.

A limitation is that theoretical guarantees depend on exchangeability between calibration and future data; performance degrades in low-information or highly ambiguous regimes, as reflected by gradual performance drops rather than catastrophic failures (Bary et al., 16 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

No Need for "Learning" to Defer? A Training Free Deferral Framework to Multiple Experts through Conformal Prediction (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Conformal Deferral Rule.