UFC-MIL: Uncertainty-Focused Calibrated MIL
- The paper introduces UFC-MIL, a multi-resolution MIL framework that integrates patch-level uncertainty and entropy-based masking to boost diagnostic reliability.
- It employs a Topological Neighbor Attention Module and Soft-Resolution Label Smoothing (SRLS) to achieve superior accuracy and calibrated confidence on multiple histopathology datasets.
- Key methodologies include multi-resolution feature extraction with frozen CNN embeddings, entropy-guided patch selection, and cross-resolution fusion that mirrors expert clinical reasoning.
Uncertainty-Focused Calibrated Multiple Instance Learning (UFC-MIL) is a multi-resolution diagnostic framework designed for histopathological whole-slide image (WSI) analysis, addressing both classification accuracy and calibration of model predictions. UFC-MIL explicitly models patch-level uncertainty to make bag-level predictions that more closely mirror clinical expert reasoning, enabling reliable diagnostic support suitable for deployment in settings requiring high reliability.
1. Model Architecture
UFC-MIL processes digital whole-slide images by extracting non-overlapping patches at distinct resolutions, commonly at $2.0$, $1.0$, and $0.5$ microns-per-pixel (MPP). Each patch is embedded via a frozen, pre-trained CNN (e.g., ResNet) into , yielding per resolution.
A learnable class token is prepended to the stack and processed through a Nyström-approximate self-attention module; this yields , modeling intra-resolution contextual dependencies. To further inject spatial information, the Topological Neighbor Attention Module (TNAM) aggregates local context among patch neighbors defined by a 4- or 8-connectivity adjacency graph , generating updated features via:
These are residual summed into , preserving the class token.
Unique to UFC-MIL, patch-wise entropy is computed and high-entropy patches are masked via a Gumbel-softmax differentiable binary mask , guiding cross-resolution feature fusion. Fine resolution features are focused on uncertain regions through:
Classification prediction at each resolution is performed by the class token , with every patch also projected via an MLP (“identical dimension reduction network”) to .
2. Mathematical Formulation
Patch-wise uncertainty quantification leverages Shannon entropy on softmax predictions:
The uncertainty-focused patch-wise loss $L_{PW}^{(r)}_i$ is defined:
$L_{PW}^{(r)}_i = (1-Y_i)\frac{1}{n_r}\sum_{n=1}^{n_r} \text{ReLU}(\hat{p}^{(r)}_{i,n}[1]-\delta) + Y_i\cdot\text{ReLU}(-\max_n \hat{p}^{(r)}_{i,n}[1]+(1-\delta))$
where is the slide-level label and is a margin. This formulation preserves the MIL assumption: negatives should not have highly positive patches, while positives should yield at least one high-confidence positive.
Classification loss at each resolution is standard cross-entropy:
$L_{CE}^{(r)}_i = -\sum_{c=0,1} \mathbb{I}[c=Y_i] \log\hat{p}^{(r)}_i[c]$
Total training loss accumulates resolution- and sample-wise terms, with patch-wise loss weighting :
$L_{total} = \sum_{i\in \text{dataset}} \sum_{r=1}^R \left[L_{CE}^{(r)}_i + \lambda L_{uncertainty}^{(r)}_i \right]$
3. Calibration Methodology (SRLS)
UFC-MIL employs Soft-Resolution Label Smoothing (SRLS) for calibration, leveraging patch-level uncertainty statistics inferred from the primary training run without extra inference iterations.
At a selected epoch, patch entropies are aggregated over the training set:
For each sample and resolution, min-max scaling standardizes these as . A smoothing factor is computed:
where is a global temperature (empirically ). The hard label is replaced by a soft target:
Over the final epochs, the model is fine-tuned using these targets, minimizing:
No extra inference loops are required, making calibration efficient and exploiting resolution outputs.
4. Training Regimen and Hyperparameterization
Optimization is performed with Adam (initial learning rate , , ) using cosine decay to zero over total epochs (typically $20$–$30$ for convergence, with SRLS applied in the last $5$–$10$). Due to resource constraints, batch size is usually $1$ WSI. Default hyperparameters derived by validation include , , and equal loss weighting ; ablation indicates no further gain by tuning .
5. Evaluation Metrics and Results
Evaluation employs:
- Classification: Accuracy (Acc), Area Under ROC Curve (AUC), Recall@10%, Recall@30%
- Calibration: Expected Calibration Error (ECE):
Where partitions samples by confidence.
Performance was assessed on three public datasets: CAMELYON16 ( WSIs), DHMC (), and BCNB (). UFC-MIL with SRLS yields competitive or superior accuracy and notably improved calibration:
| Dataset | Model | Accuracy (Acc) | ECE |
|---|---|---|---|
| CAMELYON16 | UFC-MIL | ||
| CAMELYON16 | UFC-MIL★ | ||
| CAMELYON16 | Best SOTA | $0.909$ | $0.086$ |
| DHMC | UFC-MIL★ | ||
| DHMC | Best SOTA | $0.758$ | $0.206$ |
| BCNB | UFC-MIL★ | ||
| BCNB | Best SOTA | $0.800$ | $0.108$ |
AUC on CAMELYON16 is approximately $0.964$, with UFC-MIL★ (SRLS calibrated) reducing ECE by $30$– relative to the strongest prior baseline.
6. Context and Application Significance
UFC-MIL advances multi-resolution MIL by integrating uncertainty quantification at patch level, yielding both high diagnostic fidelity and trustworthy confidence estimates. Conventional multi-resolution MILs (e.g., DS-MIL) focus exclusively on classification accuracy, whereas UFC-MIL addresses the nuanced requirement of calibration critical for clinical decision support.
By leveraging attention-driven neighbor aggregation and entropy-masked resolution zooming, UFC-MIL more closely emulates expert pathologists' workflow: regions of diagnostic ambiguity are examined at higher resolution. The patch-wise loss preserves MIL assumptions, allows ambiguous regions ("grey‐zone"), and explicitly encodes uncertainty, mitigating overconfidence in negative cases and enhancing interpretability.
The SRLS calibration approach obviates inference overhead and exploits multi-resolution predictions, providing a route for seamless calibration tuning in MIL systems. Its practical benefit is especially pronounced for deployment in environments with strict reliability constraints.
A plausible implication is the broader adoption of UFC-MIL-like architectures as calibration-aware MIL becomes a clinical requirement, with potential relevance in non-pathology domains requiring fine-grained uncertainty modeling.
7. Limitations and Directions for Further Research
While UFC-MIL demonstrates robust performance and calibration on multiple histopathology datasets, batch size is limited by GPU memory constraints and the approach requires patch extraction at multiple resolutions, increasing preprocessing burden. The architecture's reliance on a frozen feature extractor may influence adaptability across datasets with divergent statistics; fine-tuning or self-supervised pre-training represent natural extensions. Further, rigorous analysis of the calibration method's behavior under dataset shift remains an open question, particularly as SRLS is tied to entropy statistics at a specific checkpoint epoch.
Investigation into the integration of non-image modalities and expansion of uncertainty quantification schemes may broaden UFC-MIL’s applicability. Extending TNAM to model more sophisticated spatial relations or incorporating domain-specific priors could yield enhanced context modeling. Comparative paper with Bayesian and deep ensemble calibration methods would clarify UFC-MIL’s theoretical and practical position in the landscape of calibrated MIL approaches.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free