Conformal Deferral Rule: Hybrid Decision Making
- Conformal Deferral Rule is a training-free, model-agnostic framework that uses conformal prediction to quantify uncertainty and automatically defer ambiguous cases.
- It employs a segregativity criterion to select the expert with the highest empirical accuracy on the defined prediction set, ensuring effective expert routing.
- The method guarantees specified error rates through marginal coverage and has demonstrated high predictive accuracy along with significant expert workload reduction on benchmark datasets.
The conformal deferral rule is a training-free, model- and expert-agnostic framework for orchestrating deferral from AI predictors to multiple experts, leveraging conformal prediction to define quantifiable uncertainty and expert selection via a segregativity criterion. This methodology enables hybrid human-AI decision-making, automatically identifying instances where predictions are ambiguous and routing them to the most discriminative expert, while guaranteeing specified error rates through marginal coverage properties. The conformal deferral rule enables workload reduction for experts and achieves high predictive accuracy without retraining in the presence of changes to expert composition (Bary et al., 16 Sep 2025).
1. Formal Specification of Conformal Prediction and Prediction Sets
The conformal deferral rule operates on a feature space , a label space , and a pre-trained probabilistic classifier providing a label probability estimate for each . Calibration is performed using an i.i.d. set exchangeable with test points.
A nonconformity score measures the discordance between input and label . Common examples include:
- LAC (Least Ambiguous Classifier): ,
- APS (Aggregated Probability Score): ,
- RAPS (Regularized Adaptive Prediction Set): an APS variant with a penalty term.
Calibration proceeds by computing scores . For user-specified , is the smallest threshold such that , yielding a prediction set . Vovk et al. (2005) showed that, under exchangeability, (Bary et al., 16 Sep 2025).
2. Deferral Mechanism and Decision Rule
Prediction proceeds as follows: the classifier accepts responsibility and outputs if , and otherwise defers to an external expert. Deferral is thus governed by
where indicates autonomous model prediction, and triggers deferral (Bary et al., 16 Sep 2025).
3. Segregativity Criterion for Expert Selection
For experts, each has a history comprising past predictions and true labels . When and , define
and compute the segregativity
The expert selected for the deferred sample is . In case of ties, random selection or cost-aware heuristics may be employed; if , the expert with highest overall accuracy is chosen (Bary et al., 16 Sep 2025).
4. Deferral Procedure: Algorithmic Description
The conformal-segregativity deferral algorithm receives , classifier , nonconformity score , threshold , and expert histories as input, and outputs a final label:
- Construct prediction set .
- If , output the unique element of .
- Else, for each expert , compute restricted set and segregativity .
- Choose .
- Query expert and return its label.
This method is training-free and does not require model retraining when the expert pool changes (Bary et al., 16 Sep 2025).
5. Theoretical Guarantees and Performance Metrics
Marginal coverage is guaranteed under exchangeability: . The probability the model errs is . The deferral-rate quantifies the expected proportion of samples deferred to experts. The expected expert workload reduction factor is thus . Overall error decomposes as
This facilitates explicit trade-off between model autonomy and error rate by tuning ; segregativity routing ensures ambiguous points are directed to the expert with highest empirical discriminative ability on the region defined by (Bary et al., 16 Sep 2025).
6. Empirical Performance and Robustness Properties
On datasets such as CIFAR10-H and ImageNet16-H, the conformal deferral rule achieves accuracy of and , outperforming both standalone models and the top individual expert. The expected expert workload reduction factor can reach up to $11$, indicating substantial efficiency. Performance remains robust even under degraded expert accuracy, with observed error rates showing a gradual decline in low-information settings (i.e., as prediction set ambiguity grows) (Bary et al., 16 Sep 2025).
A plausible implication is that this framework provides a scalable and practical alternative to retraining-intensive learning-to-defer methods in real-world human-AI collaborative settings.
7. Contextual Significance and Limitations
Unlike classical Learning to Defer (L2D) techniques which require retraining upon changes in the expert pool, the conformal deferral rule is inherently adaptation- and retraining-free, retaining the coverage and error guarantees for all test distributions exchangeable with the calibration set. This suggests particular applicability for dynamic expert systems and mission-critical domains where both trust calibration and expert labor optimization are prioritized.
A limitation is that theoretical guarantees depend on exchangeability between calibration and future data; performance degrades in low-information or highly ambiguous regimes, as reflected by gradual performance drops rather than catastrophic failures (Bary et al., 16 Sep 2025).