CLAM-SB: Enhanced MIL for Breast Cancer

Updated 28 December 2025

The paper introduces a two-layer MLP classifier, expanded attention capacity, and robust regularization to outperform prior MIL methods in breast cancer recurrence risk stratification.
It leverages high-dimensional feature extraction using UNI and CONCH pre-trained models to convert H&E whole-slide images into informative patch embeddings.
Enhanced nonlinearities like GELU, focal loss, and label smoothing improve gradient flow and manage class imbalance in a low-sample regime.

The CLAM-SB model is a modified multiple instance learning (MIL) architecture designed for predictive computational pathology, specifically applied to the stratification of breast cancer recurrence risk from hematoxylin and eosin (H&E) stained whole-slide images (WSIs). Developed and evaluated as part of a comprehensive comparison of MIL frameworks, CLAM-SB builds on the original CLAM design while introducing key architectural and regularization enhancements to improve classification of 5-year recurrence risk tiers defined by molecular genomics.

1. Model Architecture

CLAM-SB follows an MIL paradigm in which each WSI is represented as a “bag” $X=\{x_1, \dots, x_N\}$ of $N$ non-overlapping $256\times256$ pixel patches. Each patch is transformed into a high-dimensional feature vector $h_i$ of dimension $d_{\rm in}=1024$ using a pre-trained feature extractor (UNI or CONCH). An instance encoder—a single fully connected (FC) layer—then compresses $h_i$ to a lower-dimensional embedding $u_i\in\mathbb{R}^{512}$ , followed by GELU activation and Dropout regularization.

A gated attention module computes un-normalized attention scores for each patch embedding:

$A_i = \sigma(W_p u_i + b_p) \odot \tanh(W_a u_i + b_a) \in \mathbb{R}^{384}$

$a_i = w^\top A_i + c$

where $W_a, W_p \in \mathbb{R}^{384 \times 512}$ , $w\in\mathbb{R}^{384}$ , and $\odot$ denotes elementwise multiplication. Attention weights $\alpha_i$ are calculated via softmax:

$\alpha_i = \frac{\exp(a_i)}{\sum_{j=1}^{N} \exp(a_j)}$

The slide-level embedding $z=\sum_{i=1}^N \alpha_i u_i$ is input to a two-layer MLP classifier ( $512\to256\to3$ ), with GELU and Dropout between layers, yielding logits $\ell \in \mathbb{R}^3$ corresponding to three risk classes. Final output probabilities $\hat{p}_k$ are computed by softmax.

2. Architectural Modifications to CLAM

CLAM-SB introduces significant deviations from baseline CLAM:

Classifier Depth: The original single-layer classifier is replaced with a two-layer MLP ( $512\to256\to3$ ), with intermediate Dropout.
Activation Function: GELU replaces ReLU throughout, with

$\mathrm{GELU}(x) \approx x \frac12 \left[1 + \tanh\left(\sqrt{\tfrac{2}{\pi}}(x+0.044715\,x^3)\right)\right]$

Attention Capacity: The attention network’s hidden dimension is increased from 256 to 384.
Regularization: Dropout of 0.4 is applied in the encoder, attention module, and classification head.
Loss Functions: Focal loss,

$\mathrm{FL}(p_t) = -\alpha_t (1-p_t)^\gamma \log(p_t), \quad \gamma=2,\;\;\alpha_\mathrm{medium}=3.0$

addresses severe class imbalance (particularly under-represented medium-risk class). Label smoothing with $\varepsilon=0.1$ is applied:

$y' = (1-\varepsilon)y + \frac{\varepsilon}{K}\mathbf{1},\quad K=3$

3. Data Processing and Feature Extraction

WSIs in vendor “.sdpc” format are converted to “.svs”. Tissue segmentation is carried out on a low-magnification downsampled image via adaptive Gaussian blur, HSV color space conversion, Otsu thresholding on the saturation channel, morphological filtering, and mask extraction.

Subsequently, nonoverlapping $256\times256$ patches are extracted from within the tissue mask; locations are stored in HDF5 files. Patch features are generated using the TRIDENT toolbox with two pre-trained models:

UNI: ViT-L/16 (self-supervised DINOv2)
CONCH: Vision-LLM

Each patch is resized and projected into a 1024-dimensional feature space, which is saved for subsequent MIL processing.

4. Training Protocol and Hyperparameters

CLAM-SB is trained in a stratified 5-fold cross-validation on 210 WSIs (per-fold: approximately 168 train, 42 validation). The optimization uses the Adam algorithm with a learning rate of $3\times10^{-5}$ , linear warm-up over 5 epochs, and weight decay of $1\times10^{-4}$ . Dropout is fixed at 0.4. Training proceeds for up to 100 epochs with early stopping (patience 10) and a batch size of 1 WSI per step. The loss comprises bag-level focal loss and a 0.5 weight pseudo-labeling instance loss (as in “CLAM style”), with label smoothing factor $\varepsilon=0.1$ . Attention hidden size is 384, encoder output size is 512.

5. Performance Evaluation

In five-fold cross-validation, CLAM-SB (using both UNI and CONCH features) achieved:

Mean AUC: 0.836
Mean accuracy: 76.2%

For reference, ABMIL (multi-head gated attention) attained mean AUC 0.767 and accuracy 70.9%, while ConvNeXt-MIL-XGBoost achieved accuracy 73.5% and macro F1-score 0.492.

Model	Mean AUC	Accuracy	Macro F1
CLAM-SB (UNI+CONCH)	0.836	76.2%	—
ABMIL	0.767	70.9%	—
ConvNeXt-MIL-XGBoost	—	73.5%	0.492

Editor’s term: “SB” denotes the set of enhancements above baseline CLAM.

6. Analysis of Model Efficacy

CLAM-SB’s performance advantages are attributed to several factors:

Expanded Attention Capacity: The increase to 384 hidden units in the attention module enables richer modeling of subtle, high-dimensional histological cues.
Advanced Nonlinearities: GELU activation and a deeper classifier architecture enhance gradient flow and the learning of complex interactions.
Robust Regularization: Aggressive Dropout systematically reduces overfitting in the low-sample regime.
Improved Optimization for Imbalanced Data: Focal loss with class re-weighting and label smoothing ameliorates imbalanced detection of the under-represented medium-risk class.
Multi-Modal Pre-trained Features: Combining UNI (visual) and CONCH (vision-language) feature embeddings yields a more expressive patch representation.

Collectively, these modifications promote robust feature aggregation and classifier calibration, enabling superior stratification of breast cancer recurrence risk relative to alternative MIL implementations (Chen et al., 21 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Breast Cancer Recurrence Risk Prediction Based on Multiple Instance Learning (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to CLAM-SB Model.