2000 character limit reached

UFC-MIL: Uncertainty-Focused Calibrated MIL

Updated 16 November 2025

The paper introduces UFC-MIL, a multi-resolution MIL framework that integrates patch-level uncertainty and entropy-based masking to boost diagnostic reliability.
It employs a Topological Neighbor Attention Module and Soft-Resolution Label Smoothing (SRLS) to achieve superior accuracy and calibrated confidence on multiple histopathology datasets.
Key methodologies include multi-resolution feature extraction with frozen CNN embeddings, entropy-guided patch selection, and cross-resolution fusion that mirrors expert clinical reasoning.

Uncertainty-Focused Calibrated Multiple Instance Learning (UFC-MIL) is a multi-resolution diagnostic framework designed for histopathological whole-slide image (WSI) analysis, addressing both classification accuracy and calibration of model predictions. UFC-MIL explicitly models patch-level uncertainty to make bag-level predictions that more closely mirror clinical expert reasoning, enabling reliable diagnostic support suitable for deployment in settings requiring high reliability.

1. Model Architecture

UFC-MIL processes digital whole-slide images by extracting non-overlapping patches at $R$ distinct resolutions, commonly at $2.0$, $1.0$, and $0.5$ microns-per-pixel (MPP). Each patch $x^{(r)}_{i,n}$ is embedded via a frozen, pre-trained CNN (e.g., ResNet) into $z^{(r)}_{i,n}\in\mathbb{R}^d$ , yielding $Z^{(r)}_i = [z^{(r)}_{i,1},\ldots,z^{(r)}_{i,n_r}]$ per resolution.

A learnable class token $cls^{(r)}_i \in \mathbb{R}^d$ is prepended to the stack and processed through a Nyström-approximate self-attention module; this yields $\tilde{Z}^{(r)}_i \in \mathbb{R}^{(n_r+1)\times d}$ , modeling intra-resolution contextual dependencies. To further inject spatial information, the Topological Neighbor Attention Module (TNAM) aggregates local context among patch neighbors defined by a 4- or 8-connectivity adjacency graph $A^{(r)}_i$ , generating updated features $T^{(r)}_i$ via:

$s^{(r)}_{i,n} = \frac{\exp(w^\top [\tanh(A_t\tilde{z}^{(r)}_{i,n}) \odot \sigma(A_s\tilde{z}^{(r)}_{i,n})])}{\sum_{k\in\mathcal{N}^{(r)}_{i,n}} \exp(w^\top [\tanh(A_t\tilde{z}^{(r)}_{i,k}) \odot \sigma(A_s\tilde{z}^{(r)}_{i,k})])}$

$t^{(r)}_{i,n} = \sum_{k\in\mathcal{N}^{(r)}_{i,n}} s^{(r)}_{i,k}\cdot\tilde{z}^{(r)}_{i,k}$

These are residual summed into $\tilde{Z}^{(r)}_i$ , preserving the class token.

Unique to UFC-MIL, patch-wise entropy is computed and high-entropy patches are masked via a Gumbel-softmax differentiable binary mask $m^{(r)}_i\in\{0,1\}^{n_r}$ , guiding cross-resolution feature fusion. Fine resolution features are focused on uncertain regions through:

$(1-\text{Repeat}(m^{(r)}_i, n_{r+1}/n_r)) \odot \tilde{Z}^{(r+1)}_i + \text{Repeat}(m^{(r)}_i\odot Z^{(r)}_i, n_{r+1}/n_r)$

Classification prediction at each resolution is performed by the class token $\hat{p}^{(r)}_i\in\mathbb{R}^C$ , with every patch also projected via an MLP (“identical dimension reduction network”) to $\hat{p}^{(r)}_{i,n}\in\mathbb{R}^C$ .

2. Mathematical Formulation

Patch-wise uncertainty quantification leverages Shannon entropy on softmax predictions:

$H^{(r)}_i[n] = -\sum_{c=0,1} \hat{p}^{(r)}_{i,n}[c] \cdot \log_2 \hat{p}^{(r)}_{i,n}[c]$

The uncertainty-focused patch-wise loss $L_{PW}^{(r)}_i$ is defined:

$L_{PW}^{(r)}_i = (1-Y_i)\frac{1}{n_r}\sum_{n=1}^{n_r} \text{ReLU}(\hat{p}^{(r)}_{i,n}[1]-\delta) + Y_i\cdot\text{ReLU}(-\max_n \hat{p}^{(r)}_{i,n}[1]+(1-\delta))$

where $Y_i\in\{0,1\}$ is the slide-level label and $\delta<0.5$ is a margin. This formulation preserves the MIL assumption: negatives should not have highly positive patches, while positives should yield at least one high-confidence positive.

Classification loss at each resolution is standard cross-entropy:

$L_{CE}^{(r)}_i = -\sum_{c=0,1} \mathbb{I}[c=Y_i] \log\hat{p}^{(r)}_i[c]$

Total training loss accumulates resolution- and sample-wise terms, with patch-wise loss weighting $\lambda$ :

$L_{total} = \sum_{i\in \text{dataset}} \sum_{r=1}^R \left[L_{CE}^{(r)}_i + \lambda L_{uncertainty}^{(r)}_i \right]$

3. Calibration Methodology (SRLS)

UFC-MIL employs Soft-Resolution Label Smoothing (SRLS) for calibration, leveraging patch-level uncertainty statistics inferred from the primary training run without extra inference iterations.

At a selected epoch, patch entropies $H^{(r)}_i[n]$ are aggregated over the training set:

$\mu^{(r)} = \text{avg}_i \; \text{mean}_n H^{(r)}_i[n],\quad \sigma^{(r)} = \text{avg}_i \; \text{std}_n H^{(r)}_i[n]$

For each sample and resolution, min-max scaling standardizes these as $\tilde \mu^{(r)}_i, \tilde \sigma^{(r)}_i \in [0,1]$ . A smoothing factor is computed:

$\epsilon^{(r)}_i = \frac{1}{2}\left(\tilde \mu^{(r)}_i + \tilde \sigma^{(r)}_i \right)\cdot\alpha$

where $\alpha$ is a global temperature (empirically $\alpha=0.1$ ). The hard label $Y_i$ is replaced by a soft target:

$\tilde Y^{(r)}_i = (1-\epsilon^{(r)}_i)Y_i + \frac{\epsilon^{(r)}_i}{C}$

Over the final $K$ epochs, the model is fine-tuned using these targets, minimizing:

$L_{calib} = \sum_i \sum_r \text{CE}(\hat{p}^{(r)}_i, \tilde Y^{(r)}_i)$

No extra inference loops are required, making calibration efficient and exploiting resolution outputs.

4. Training Regimen and Hyperparameterization

Optimization is performed with Adam (initial learning rate $1\times10^{-4}$ , $\beta_1=0.9$ , $\beta_2=0.999$ ) using cosine decay to zero over $T$ total epochs (typically $20$–$30$ for convergence, with SRLS applied in the last $5$–$10$). Due to resource constraints, batch size is usually $1$ WSI. Default hyperparameters derived by validation include $\delta=0.49$ , $\alpha=0.1$ , and equal loss weighting $\lambda=1.0$ ; ablation indicates no further gain by tuning $\lambda$ .

5. Evaluation Metrics and Results

Evaluation employs:

Classification: Accuracy (Acc), Area Under ROC Curve (AUC), Recall@10%, Recall@30%
Calibration: Expected Calibration Error (ECE):

$\text{ECE} = \sum_{m=1}^M \frac{|B_m|}{N}|\text{Acc}(B_m) - \text{Conf}(B_m)|$

Where $B_m$ partitions samples by confidence.

Performance was assessed on three public datasets: CAMELYON16 ( $n=400$ WSIs), DHMC ( $n=143$ ), and BCNB ( $n=1,058$ ). UFC-MIL with SRLS yields competitive or superior accuracy and notably improved calibration:

Dataset	Model	Accuracy (Acc)	ECE
CAMELYON16	UFC-MIL	$0.917\pm0.038$	$0.086\pm0.037$
CAMELYON16	UFC-MIL★	$0.941\pm0.011$	$0.056\pm0.016$
CAMELYON16	Best SOTA	$0.909$	$0.086$
DHMC	UFC-MIL★	$0.812\pm0.021$	$0.189\pm0.021$
DHMC	Best SOTA	$0.758$	$0.206$
BCNB	UFC-MIL★	$0.820\pm0.028$	$0.077\pm0.033$
BCNB	Best SOTA	$0.800$	$0.108$

AUC on CAMELYON16 is approximately $0.964$, with UFC-MIL★ (SRLS calibrated) reducing ECE by $30$– $40\%$ relative to the strongest prior baseline.

6. Context and Application Significance

UFC-MIL advances multi-resolution MIL by integrating uncertainty quantification at patch level, yielding both high diagnostic fidelity and trustworthy confidence estimates. Conventional multi-resolution MILs (e.g., DS-MIL) focus exclusively on classification accuracy, whereas UFC-MIL addresses the nuanced requirement of calibration critical for clinical decision support.

By leveraging attention-driven neighbor aggregation and entropy-masked resolution zooming, UFC-MIL more closely emulates expert pathologists' workflow: regions of diagnostic ambiguity are examined at higher resolution. The patch-wise loss preserves MIL assumptions, allows ambiguous regions ("grey‐zone"), and explicitly encodes uncertainty, mitigating overconfidence in negative cases and enhancing interpretability.

The SRLS calibration approach obviates inference overhead and exploits multi-resolution predictions, providing a route for seamless calibration tuning in MIL systems. Its practical benefit is especially pronounced for deployment in environments with strict reliability constraints.

A plausible implication is the broader adoption of UFC-MIL-like architectures as calibration-aware MIL becomes a clinical requirement, with potential relevance in non-pathology domains requiring fine-grained uncertainty modeling.

7. Limitations and Directions for Further Research

While UFC-MIL demonstrates robust performance and calibration on multiple histopathology datasets, batch size is limited by GPU memory constraints and the approach requires patch extraction at multiple resolutions, increasing preprocessing burden. The architecture's reliance on a frozen feature extractor may influence adaptability across datasets with divergent statistics; fine-tuning or self-supervised pre-training represent natural extensions. Further, rigorous analysis of the calibration method's behavior under dataset shift remains an open question, particularly as SRLS is tied to entropy statistics at a specific checkpoint epoch.

Investigation into the integration of non-image modalities and expansion of uncertainty quantification schemes may broaden UFC-MIL’s applicability. Extending TNAM to model more sophisticated spatial relations or incorporating domain-specific priors could yield enhanced context modeling. Comparative paper with Bayesian and deep ensemble calibration methods would clarify UFC-MIL’s theoretical and practical position in the landscape of calibrated MIL approaches.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Uncertainty-Focused Calibrated MIL (UFC-MIL).