Papers
Topics
Authors
Recent
2000 character limit reached

Conformal Deferral Rule: Hybrid Decision Making

Updated 30 November 2025
  • Conformal Deferral Rule is a training-free, model-agnostic framework that uses conformal prediction to quantify uncertainty and automatically defer ambiguous cases.
  • It employs a segregativity criterion to select the expert with the highest empirical accuracy on the defined prediction set, ensuring effective expert routing.
  • The method guarantees specified error rates through marginal coverage and has demonstrated high predictive accuracy along with significant expert workload reduction on benchmark datasets.

The conformal deferral rule is a training-free, model- and expert-agnostic framework for orchestrating deferral from AI predictors to multiple experts, leveraging conformal prediction to define quantifiable uncertainty and expert selection via a segregativity criterion. This methodology enables hybrid human-AI decision-making, automatically identifying instances where predictions are ambiguous and routing them to the most discriminative expert, while guaranteeing specified error rates through marginal coverage properties. The conformal deferral rule enables workload reduction for experts and achieves high predictive accuracy without retraining in the presence of changes to expert composition (Bary et al., 16 Sep 2025).

1. Formal Specification of Conformal Prediction and Prediction Sets

The conformal deferral rule operates on a feature space XX, a label space Y={1,...,m}Y = \{1, ..., m\}, and a pre-trained probabilistic classifier ϕ:XΔm1\phi:X\to\Delta^{m-1} providing a label probability estimate ϕy(x)\phi_y(x) for each yYy\in Y. Calibration is performed using an i.i.d. set Dcal={(xi,yi)}i=1nD_{\text{cal}} = \{(x_i, y_i)\}_{i=1}^n exchangeable with test points.

A nonconformity score s:X×YRs:X\times Y\to\mathbb{R} measures the discordance between input xx and label yy. Common examples include:

  • LAC (Least Ambiguous Classifier): sLAC(x,y)=ϕy(x)s_{\text{LAC}}(x, y) = -\phi_y(x),
  • APS (Aggregated Probability Score): sAPS(x,y)=y:ϕy(x)ϕy(x)ϕy(x)s_{\text{APS}}(x, y) = \sum_{y':\phi_{y'}(x)\geq\phi_y(x)} \phi_{y'}(x),
  • RAPS (Regularized Adaptive Prediction Set): an APS variant with a penalty term.

Calibration proceeds by computing scores ri=s(xi,yi)r_i = s(x_i, y_i). For user-specified α(0,1)\alpha\in(0,1), tαt_\alpha is the smallest threshold such that #{i:ritα}(1α)(n+1)\#\{i: r_i \leq t_\alpha\} \geq \lceil (1-\alpha)(n+1)\rceil, yielding a prediction set Cα(x)={yY:s(x,y)tα}C_\alpha(x) = \{y\in Y: s(x, y)\leq t_\alpha\}. Vovk et al. (2005) showed that, under exchangeability, P(x,y)[yCα(x)]1αP_{(x, y)}[y \in C_\alpha(x)] \geq 1-\alpha (Bary et al., 16 Sep 2025).

2. Deferral Mechanism and Decision Rule

Prediction proceeds as follows: the classifier accepts responsibility and outputs y^(x)=argmaxyCα(x)ϕy(x)\hat{y}(x) = \arg\max_{y\in C_\alpha(x)} \phi_y(x) if Cα(x)=1|C_\alpha(x)| = 1, and otherwise defers to an external expert. Deferral is thus governed by

δ(x)={1if Cα(x)=1 0otherwise\delta(x) = \begin{cases} 1 & \text{if } |C_\alpha(x)| = 1 \ 0 & \text{otherwise} \end{cases}

where δ(x)=1\delta(x)=1 indicates autonomous model prediction, and δ(x)=0\delta(x)=0 triggers deferral (Bary et al., 16 Sep 2025).

3. Segregativity Criterion for Expert Selection

For KK experts, each kk has a history Yk={(y^k,j,yj)}j=1NkY_k = \{(\hat{y}_{k,j}, y_j)\}_{j=1}^{N_k} comprising past predictions y^k,j\hat{y}_{k,j} and true labels yjy_j. When δ(x)=0\delta(x)=0 and Cα(x)=ΓC_\alpha(x)=\Gamma, define

Yˇk(Γ)={(y^k,j,yj)Yk:y^k,jΓ,yjΓ}\check{Y}_k(\Gamma) = \{\,(\hat{y}_{k,j}, y_j)\in Y_k:\, \hat{y}_{k,j}\in\Gamma,\, y_j\in\Gamma\,\}

and compute the segregativity

σk(Γ)=1Yˇk(Γ)(y^,y)Yˇk(Γ)1y^=y\sigma_k(\Gamma) = \frac{1}{|\check{Y}_k(\Gamma)|} \sum_{(\hat{y}, y)\in\check{Y}_k(\Gamma)} \mathbf{1}_{\hat{y} = y}

The expert selected for the deferred sample is k(x)=argmaxk=1,,Kσk(Cα(x))k^*(x) = \arg\max_{k=1,\ldots, K} \sigma_k(C_\alpha(x)). In case of ties, random selection or cost-aware heuristics may be employed; if Cα(x)=C_\alpha(x)=\emptyset, the expert with highest overall accuracy is chosen (Bary et al., 16 Sep 2025).

4. Deferral Procedure: Algorithmic Description

The conformal-segregativity deferral algorithm receives xXx\in X, classifier ϕ\phi, nonconformity score ss, threshold tαt_\alpha, and expert histories Y1,,YKY_1,\ldots,Y_K as input, and outputs a final label:

  1. Construct prediction set Γ={yY:s(x,y)tα}\Gamma = \{y\in Y: s(x,y)\leq t_\alpha\}.
  2. If Γ=1|\Gamma| = 1, output the unique element of Γ\Gamma.
  3. Else, for each expert kk, compute restricted set Yˇk\check{Y}_k and segregativity σk\sigma_k.
  4. Choose k=argmaxkσkk^* = \arg\max_k \sigma_k.
  5. Query expert kk^* and return its label.

This method is training-free and does not require model retraining when the expert pool changes (Bary et al., 16 Sep 2025).

5. Theoretical Guarantees and Performance Metrics

Marginal coverage is guaranteed under exchangeability: P[yCα(x)]1αP[y\in C_\alpha(x)] \geq 1-\alpha. The probability the model errs is P[model-error]αP[\text{model-error}] \leq \alpha. The deferral-rate Δ(α)=P[Cα(x)>1]\Delta(\alpha) = P[|C_\alpha(x)| > 1] quantifies the expected proportion of samples deferred to experts. The expected expert workload reduction factor is thus R(α)=1/Δ(α)R(\alpha) = 1/\Delta(\alpha). Overall error decomposes as

P[system-error]=emodel+eexpertα+(1P[Cα(x)=1])maxk(expertk error on their region)P[\text{system-error}] = e_{\text{model}} + e_{\text{expert}} \leq \alpha + (1 - P[|C_\alpha(x)|=1]) \cdot \max_k (\text{expert}_k \text{ error on their region})

This facilitates explicit trade-off between model autonomy and error rate by tuning α\alpha; segregativity routing ensures ambiguous points are directed to the expert with highest empirical discriminative ability on the region defined by Cα(x)C_\alpha(x) (Bary et al., 16 Sep 2025).

6. Empirical Performance and Robustness Properties

On datasets such as CIFAR10-H and ImageNet16-H, the conformal deferral rule achieves accuracy of 99.57±0.10%99.57\pm0.10\% and 99.40±0.52%99.40\pm0.52\%, outperforming both standalone models and the top individual expert. The expected expert workload reduction factor can reach up to $11$, indicating substantial efficiency. Performance remains robust even under degraded expert accuracy, with observed error rates showing a gradual decline in low-information settings (i.e., as prediction set ambiguity grows) (Bary et al., 16 Sep 2025).

A plausible implication is that this framework provides a scalable and practical alternative to retraining-intensive learning-to-defer methods in real-world human-AI collaborative settings.

7. Contextual Significance and Limitations

Unlike classical Learning to Defer (L2D) techniques which require retraining upon changes in the expert pool, the conformal deferral rule is inherently adaptation- and retraining-free, retaining the coverage and error guarantees for all test distributions exchangeable with the calibration set. This suggests particular applicability for dynamic expert systems and mission-critical domains where both trust calibration and expert labor optimization are prioritized.

A limitation is that theoretical guarantees depend on exchangeability between calibration and future data; performance degrades in low-information or highly ambiguous regimes, as reflected by gradual performance drops rather than catastrophic failures (Bary et al., 16 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Conformal Deferral Rule.