Calibratable Disambiguation Loss (CDL)

Updated 26 December 2025

The paper introduces CDL as a weighted focal-style loss that enhances accuracy and calibration under weak supervision.
CDL leverages a momentum-based weighting mechanism and down-weights overconfident bags, offering provable lower bounds relative to traditional losses.
Empirical studies reveal that CDL variants achieve up to 23% accuracy gains and significant reductions in expected calibration error across diverse datasets.

Calibratable Disambiguation Loss (CDL) refers to both a loss function for multi-instance partial-label learning (MIPL) and a decision-theoretic calibration measure in probabilistic prediction. In recent literature, these concepts have been formalized and analyzed distinctly: CDL as a class-conditional, plug-and-play loss for weak supervision in MIPL and PLL (Tang et al., 19 Dec 2025); and $\mathsf{CDL}$ as calibration decision loss for evaluating the potential of recalibration for probabilistic classifiers (Gopalan et al., 17 Nov 2025). Both approaches address model calibration but at different levels—CDL through loss design for learning, and calibration decision loss by quantifying recalibration potential.

1. Formal Definition: CDL in Multi-Instance Partial-Label Learning

CDL for MIPL is a weighted, focal-style cross-entropy, explicitly constructed to optimize both accuracy and calibration in the context of inexact supervision over both instances and labels. Given a bag $i$ with candidate label set $\mathcal{S}_i$ , model-predicted probabilities $\hat{p}_{i,1},\ldots,\hat{p}_{i,k}$ , and dynamic candidate-label weights $w_{i,c}$ , CDL is defined as

$\mathcal{L}_{\mathrm{CDL}} = -\sum_{c \in \mathcal{S}_i} w_{i,c} (1 - M_i + \Phi_i)^{\gamma} \log \hat{p}_{i,c}$

where

$M_i = \max_{c \in \mathcal{S}_i} \hat{p}_{i,c}, \qquad \Phi_i = \Phi(\hat{\mathbf{p}}_i)$

and $\gamma \ge 1$ is a tuning exponent. Two plug-and-play instantiations are proposed:

CDL-CC (Candidate–Candidate): $\Phi_i = \max_{c' \in \mathcal{S}_i,\,c'\neq c} \hat{p}_{i,c'}$
CDL-CN (Candidate–Noncandidate): $\Phi_i = \max_{\bar{c} \in \mathcal{Y} \setminus \mathcal{S}_i} \hat{p}_{i,\bar{c}}$

The loss upweights under-confident predictions and regularizes against overconfident, incorrect assignments. Weights $w_{i,c}$ follow a momentum schedule, linearly decayed via $\alpha^{(t)} = (T-t)/T$ and updated using predicted probabilities.

2. Theoretical Properties and Regularization

CDL acts as an implicit regularizer and provides a provable lower bound on the conventional momentum-based disambiguation loss (MDL). For each bag, define the confidence margin $\beta_i = \max_{c' \in \mathcal{S}_i} \hat{p}_{i,c'} - \Phi_i$ . Under the condition that $\Phi_i \in [\max_{c'} \hat{p}_{i,c'} - 1, \max_{c'} \hat{p}_{i,c'}]$ and $1 \le \gamma < 1/\max_i \beta_i$ , the paper establishes:

$\mathcal{L}_{\mathrm{CDL}} \geq (1 - \gamma \beta_i) \mathcal{L}_{\mathrm{MDL}} = (1 - \gamma \beta_i)\left[\mathrm{KL}(w_i\|\hat{p}_i) + H(w_i)\right]$

The regularization effect selectively down-weights high-confidence (potentially overconfident) bags, controlling for calibration while maintaining discriminative learning. This property differentiates CDL from conventional (or focal) losses, which can degrade calibration when naively applied in weakly supervised regimes (Tang et al., 19 Dec 2025).

3. Implementation in MIPL and PLL Pipelines

CDL integrates seamlessly into the embedded-space MIPL pipeline:

Extract instance-level features $\{h_{i,j}\}$ .
Compute attention via Dam, Sam, or Mam.
Aggregate to a bag-level embedding $z_i$ .
Predict label probabilities $\hat{p}_{i,c}$ .
Update candidate-label weights $w_{i,c}$ using a momentum average.
Compute CDL-CC or CDL-CN.
Perform SGD-based optimization.

For partial-label learning (PLL), CDL serves as a direct replacement for standard disambiguation losses without necessitating architectural changes in models such as Pop, ProDen, or LWS.

Hyperparameters are inherited from base models, with $\gamma$ serving as the primary calibration control. Values of $\gamma = 1$ provide robust out-of-the-box performance; adjustments to $2$ or $3$ are recommended if calibration error, measured via expected calibration error (ECE), remains high.

4. Empirical Performance and Benchmarks

CDL displays substantial empirical gains across benchmark and real-world datasets:

On MNIST-MIPL, FMNIST-MIPL, Birdsong-MIPL, SIVAL-MIPL (with bag size and false positive $r$ variation), CDL variants outperform DeMipl, EliMipl, and MiplMa in $105/110$ cases, with accuracy gains up to $+23\%$ .
On a colorectal cancer dataset (CRC-MIPL), featuring both handcrafted (Row, SBN, KMeans, SIFT) and deep (ResNet-34) features, the improvement ranges from $+13\%$ to $+18\%$ .
ECE reductions of $16\%-58\%$ (mean $44\%$ ) on benchmarks, and up to $33\%$ on CRC. CDL achieves the lowest ECE in $93/95$ trials.

Reliability diagrams confirm that CDL calibrates confidence in line with observed accuracy. t-SNE visualizations of feature space indicate tighter, more separated class clusters.

Ablation studies reveal:

CDL-CN frequently exhibits higher accuracy, while CDL-CC can yield slightly reduced ECE.
Dam is optimal for low-dimensional, handcrafted features; Sam and Mam are preferred for high-dimensional, learned feature spaces.
Naïve focal losses (FL/IFL) tend to compromise either accuracy or calibration in MIPL; CDL balances both objectives.

5. Practical Guidance and Integration

CDL is characterized as "plug-and-play": it requires no modifications to the network backbone and can be employed in any MIPL or PLL pipeline (Tang et al., 19 Dec 2025).

Recommendations for practice:

Default to $\gamma=1$ , increasing only if calibration (as measured by ECE on a hold-out set) remains inadequate.
Prefer CDL-CC in regimes with semantically similar candidate labels (focus on calibration), and CDL-CN when non-candidate label noise is a significant challenge (focus on accuracy).
Choose Dam for simple, handcrafted features, and Sam/Mam for complex, deep representations.
Monitor ECE during training; the dynamic $w_{i,c}$ weighting will prioritize under-confident bags.
No further modifications are needed for integration into existing partial-label or multi-instance pipelines.

6. Calibration Decision Loss (CDL) in Decision-Theoretic Calibration

A separate, decision-theoretic calibration loss—also denoted CDL—has been introduced to quantify the maximal improvement achievable by post-processing a probabilistic predictor over any proper loss (Gopalan et al., 17 Nov 2025). Calibration Decision Loss is defined as

$\mathsf{CDL}(J) = \sup_{\ell \in L^*,\,\kappa \in \mathcal{F}} \mathbb{E}_{(p,y) \sim J}[\ell(p,y) - \ell(\kappa(p),y)]$

where $J$ is the joint distribution of predicted probabilities and binary outcomes, $L^*$ the family of proper losses, and $\mathcal{F}$ the set of all post-processings. Perfect calibration corresponds to $\mathsf{CDL}(J)=0$ . However, for general $K$ , even weakly approximating $\mathsf{CDL}$ from black-box samples is intractable.

Tractable auditing is achieved by restricting recalibrations to a low-complexity family $K$ (e.g., monotone, low-crossing). In that setting, both the sample and computational complexity depend on the VC dimension of the associated class $thr(K)$ . Efficient algorithms (e.g., agnostic learning reductions, Pool Adjacent Violators for monotone $K$ , and uniform-mass binning for low-crossing $K$ ) provide "omnipredictors": recalibrations achieving near-optimal loss over all proper losses in $K$ , with formal generalization guarantees.

This suggests a principled link between CDL-based model selection in weak supervision and stress-testing calibration with downstream decision losses. However, the two frameworks address different aspects: optimization vs. post-processing for calibration.

7. Relationship to Calibration Metrics and Broader Impact

CDL's introduction in weak supervision addresses a significant gap in model reliability, supplementing traditional accuracy-driven objectives with rigorous calibration control. It is empirically and theoretically superior to existing disambiguation and focal losses for MIPL and PLL, offering a lower ECE and greater classification accuracy. The decision-theoretic variant, while intractable in unrestricted form, frames the calibration problem comprehensively, connecting recalibration capacity to core machine learning theory via VC dimension, agnostic learning, and omniprediction.

Application of CDL in MIPL/PLL is practical, robust, and highly generalizable, with current evidence highlighting improved learning dynamics, stable calibration, and interpretable confidence estimates (Tang et al., 19 Dec 2025, Gopalan et al., 17 Nov 2025). A plausible implication is that future methods for weakly supervised and partially labeled data will standardize CDL or closely related constructs for joint calibration and accuracy optimization.

Markdown Report Issue Upgrade to Chat

References (2)

Calibratable Disambiguation Loss for Multi-Instance Partial-Label Learning (2025)

Efficient Calibration for Decision Making (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Calibratable Disambiguation Loss (CDL).