Papers
Topics
Authors
Recent
2000 character limit reached

Multi-Expert Deferral in Human-AI Collaboration

Updated 4 January 2026
  • The paper introduces a framework that integrates automated predictions with deferral to multiple experts based on confidence levels and cost considerations.
  • It employs scalable, cost-sensitive algorithms and conformal prediction techniques to optimize accuracy while significantly reducing expert workload.
  • The approach extends to multi-task, regression, and continuous output settings, offering robust consistency guarantees and practical solutions under budget constraints.

Learning with Multiple-Expert Deferral is a framework within human-AI collaboration wherein an autonomous model predicts directly when confident, but defers uncertain inputs to one among several human or algorithmic experts. The paradigm encompasses various domains, including image classification, regression, and multi-task learning, and addresses both practical system design (deferral rules, cost constraints, capacity planning) and theoretical analysis (consistency guarantees, loss construction). Research efforts concentrate on scalable, expert-agnostic, and cost-sensitive algorithms for optimizing system accuracy, workload distribution, and robustness to expert pool composition.

1. Formal Problem Structure and Deferral Setup

The canonical multiple-expert deferral setting considers an input space X\mathcal{X}, label space Y\mathcal{Y}, and a pool of KK experts E1,,EKE_1, \ldots, E_K, each expert EeE_e providing predictions y^e\hat{y}_e for any input xx. The learner's deferral policy can select either the autonomous prediction or an expert, optimizing for a cost-sensitive loss incorporating prediction errors, expert reliability, and consultation workload (Bary et al., 16 Sep 2025, Mao et al., 2023, Mao et al., 25 Jun 2025).

For instance, in a single-stage scheme, the learner jointly produces a K+nK+n-class score h(x,y)h(x, y^*) over all regular labels and experts (represented as special labels). The final prediction for input xx is

y^(x)=argmaxy[n+K]h(x,y)\hat y(x) = \arg\max_{y \in [n + K]} h(x, y)

choosing a direct label if y^(x)n\hat y(x)\leq n or deferring to expert e=y^(x)ne=\hat y(x)-n, incurring an expert-specific cost ce(x,y)c_e(x, y).

In a two-stage scheme, a fixed classifier is wrapped by a deferral routing function r(x)r(x) over experts, potentially with learned surrogates modeling the cost of each assignment.

Deferral systems can further incorporate misclassification cost asymmetries, explicit workload constraints per expert, and dynamic expert pools (Alves et al., 2024, Strong et al., 14 Feb 2025, Tailor et al., 2024).

2. Training-Free and Conformal Deferral Approaches

The framework introduced in "No Need for 'Learning' to Defer? A Training-Free Deferral Framework to Multiple Experts through Conformal Prediction" (Bary et al., 16 Sep 2025) departs from explicit learning by leveraging conformal prediction for uncertainty quantification and expert selection.

Conformal Prediction Core Steps:

  • A pretrained classifier ϕ\phi is wrapped by a conformal predictor using a calibration set Dcal\mathcal{D}_{\mathrm{cal}} and a nonconformity score η(x,y)\eta(x,y), quantifying how atypical the model's prediction (ϕy(x)\phi_y(x)) is for input xx and label yy.
  • For each label, empirical p-values py(x)p_y(x) are calculated; labels with py(x)>αp_y(x) > \alpha (miscoverage level) form set S(x,α)S(x,\alpha), guaranteed to contain the true label with probability at least 1α1-\alpha.
  • If S(x,α)=1|S(x,\alpha)|=1, the system predicts directly; otherwise, it invokes the segregativity criterion.

Segregativity-Based Expert Selection:

  • For each expert ee, historical judgments Ye\mathcal{Y}_e are filtered to those where both prediction and ground truth fall within S(x,α)S(x, \alpha).
  • Segregativity Sege(S)\mathrm{Seg}_e(S) is defined as the empirical accuracy restricted to the set, measuring an expert's discriminativeness over the ambiguous label subset.
  • Deferral is made to e(x)=argmaxeSege(S(x,α))e^*(x) = \arg\max_e \mathrm{Seg}_e(S(x, \alpha)).

This scheme is robust to changes in expert composition—new experts require only updates to confusion data, and the system is entirely retraining-free. Experimental results on CIFAR10-H and ImageNet16-H show that this approach can attain 99.5%\approx99.5\% accuracy and reduce expert workload by up to a factor of $11$, outperforming both the best standalone model and the oracle best-expert strategy.

3. Surrogate Losses and Consistency Guarantees

Multiple-expert deferral learning is underpinned by surrogate loss function design. Principled surrogates balance system accuracy, expert cost, and workload by convex relaxation and are supported by strong H\mathcal{H}-consistency bounds.

Single-Stage Surrogate (General Form) (Mao et al., 2023, Mao et al., 25 Jun 2025, Mao, 28 Dec 2025):

L(h;x,y)=(h;x,y)+j=1K(1cj(x,y))(h;x,n+j)L(h; x, y) = \ell(h; x, y) + \sum_{j=1}^{K} (1 - c_j(x, y))\,\ell(h; x, n + j)

where \ell is a standard multiclass surrogate (e.g., softmax, exponential), and (1cj(x,y))(1 - c_j(x, y)) weights the deferral term for expert jj.

Two-Stage Surrogate:

After pre-training the predictor hh, the deferral routing function rr is learned via a margin-based or multiclass surrogate:

Lh(r;x,y)=[j=1Kcj(x,y)](r;x,0)+j=1K[L(h(x),y)+kjck(x,y)](r;x,j)L^h(r; x, y) = \left[\sum_{j=1}^K c_j(x, y)\right]\,\ell(r; x, 0) + \sum_{j=1}^K \left[L(h(x), y) + \sum_{k\neq j} c_k(x, y)\right]\ell(r; x, j)

Consistency Guarantees:

  • H\mathcal{H}-consistency bounds (Awasthi–Mao–Mohri–Zhong framework) provide non-asymptotic, hypothesis-class-specific error controllability; excess deferral risk is upper-bounded by a calibration function Γ\Gamma of the surrogate excess risk.
  • Realizable consistency ensures that minimization within the chosen hypothesis class drives both surrogate and true risk to zero if possible.
  • These results generalize to regression, multi-task regimes, and Top-kk deferral; in regression, bounded surrogate losses yield similar H\mathcal{H}-consistency properties (Mao et al., 2024, Mao, 28 Dec 2025).

4. Cost Sensitivity, Budget Constraints, and Workload Allocation

Real-world systems frequently require cost-sensitive deferral policies and explicit handling of expert capacity constraints.

Cost-sensitive Deferral:

  • Individual costs for Type I/II errors, expert workload, and deferral itself are modeled per instance and prediction type.
  • "Deferral under cost and capacity constraints" (DeCCaF) (Alves et al., 2024) shows improved system cost under constraints by learning a separate human expertise model for pe(x)=P[me(x)=y]p_e(x) = P[m_e(x) = y] and integrating cost and workload constraints via combinatorial optimization.

Budgeted Deferral (DeSalvo et al., 30 Oct 2025):

  • Budgeted training minimizes the number of expensive expert queries.
  • Importance-weighted active learning strategies selectively query experts for cost labels, leveraging disagreement coefficients and surrogate risk bounds to efficiently allocate queries.
  • Theoretical analyses guarantee that budgeted methods approach the accuracy of full-query baselines with substantially reduced training and validation cost.

Top-kk Deferral (Montreuil et al., 15 May 2025):

  • Beyond single-expert deferral, Top-kk approaches select sets of kk entities (experts/labels) per input, optimizing trade-off between predictive accuracy and consultation resource expenditure.
  • Convex, cost-sensitive surrogate losses independent of kk enable generalization across deferral cardinalities and guarantee Bayes- and H\mathcal{H}-consistency.

5. Meta-Learning, Expert-Agnostic, and Population Adaptation

Recent advances aim for flexibility in expert pools, supporting rapid adaptation to previously unseen experts or populations (Tailor et al., 2024, Strong et al., 14 Feb 2025).

Meta-learning Approaches:

  • The "Learning to Defer to a Population" framework (Tailor et al., 2024) uses meta-learning—both adaptation (MAML/fine-tune) and model-based attention—to build deferral policies able to exploit small context sets representing new experts.
  • Neural process-based deferrer heads attend to similarity between new inputs and context set predictions, effectively enabling representation-based specialization.
  • EA-L2D (Strong et al., 14 Feb 2025) further generalizes, using Bayesian modelling of expert correctness via per-class Beta priors and observed context, yielding robust performance and generalization to out-of-distribution experts.

6. Multi-Task and Continuous Output Deferral

Learning with multiple-expert deferral is deployed in multi-task settings (joint classification/regression) and continuous-output domains.

Unified Multi-Task Deferral:

  • Two-stage multi-expert deferral is extended to object detection and electronic health records, using composite surrogates combining cross-entropy and regression loss (Montreuil et al., 2024).
  • Systematic analysis addresses minimizability gap, consistency bounds, and impact of shared representations.

Regression with Deferral:

  • Regression settings require specialized surrogates coping with infinite label spaces (Mao et al., 2024, Mao, 28 Dec 2025). Single- and two-stage formulations provide theoretical and empirical improvements over abstention-only or naive baselines.
  • Surrogates are constructed by decomposing cost-weighted indicator sums and replacing them with differentiable multiclass losses.

7. Practical Limitations, Extensions, and Open Problems

Key limitations in current approaches are choice and design of cost functions, reliance on sufficiently large calibration or expert confusion datasets, instance-level expert heterogeneity, and static expert pool assumptions. Prospective improvements include:

The field continues to expand to richer cost regimes, workload-aware dispatch, adaptive Top-kk decision making, and meta-learned deferral over heterogeneous and shifting expert populations, with robust theoretical support and empirical validation.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Learning with Multiple-Expert Deferral.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube