CLAM-SB: Enhanced MIL for Breast Cancer
- The paper introduces a two-layer MLP classifier, expanded attention capacity, and robust regularization to outperform prior MIL methods in breast cancer recurrence risk stratification.
- It leverages high-dimensional feature extraction using UNI and CONCH pre-trained models to convert H&E whole-slide images into informative patch embeddings.
- Enhanced nonlinearities like GELU, focal loss, and label smoothing improve gradient flow and manage class imbalance in a low-sample regime.
The CLAM-SB model is a modified multiple instance learning (MIL) architecture designed for predictive computational pathology, specifically applied to the stratification of breast cancer recurrence risk from hematoxylin and eosin (H&E) stained whole-slide images (WSIs). Developed and evaluated as part of a comprehensive comparison of MIL frameworks, CLAM-SB builds on the original CLAM design while introducing key architectural and regularization enhancements to improve classification of 5-year recurrence risk tiers defined by molecular genomics.
1. Model Architecture
CLAM-SB follows an MIL paradigm in which each WSI is represented as a “bag” of non-overlapping pixel patches. Each patch is transformed into a high-dimensional feature vector of dimension using a pre-trained feature extractor (UNI or CONCH). An instance encoder—a single fully connected (FC) layer—then compresses to a lower-dimensional embedding , followed by GELU activation and Dropout regularization.
A gated attention module computes un-normalized attention scores for each patch embedding:
where , , and denotes elementwise multiplication. Attention weights are calculated via softmax:
The slide-level embedding is input to a two-layer MLP classifier (), with GELU and Dropout between layers, yielding logits corresponding to three risk classes. Final output probabilities are computed by softmax.
2. Architectural Modifications to CLAM
CLAM-SB introduces significant deviations from baseline CLAM:
- Classifier Depth: The original single-layer classifier is replaced with a two-layer MLP (), with intermediate Dropout.
- Activation Function: GELU replaces ReLU throughout, with
- Attention Capacity: The attention network’s hidden dimension is increased from 256 to 384.
- Regularization: Dropout of 0.4 is applied in the encoder, attention module, and classification head.
- Loss Functions: Focal loss,
addresses severe class imbalance (particularly under-represented medium-risk class). Label smoothing with is applied:
3. Data Processing and Feature Extraction
WSIs in vendor “.sdpc” format are converted to “.svs”. Tissue segmentation is carried out on a low-magnification downsampled image via adaptive Gaussian blur, HSV color space conversion, Otsu thresholding on the saturation channel, morphological filtering, and mask extraction.
Subsequently, nonoverlapping patches are extracted from within the tissue mask; locations are stored in HDF5 files. Patch features are generated using the TRIDENT toolbox with two pre-trained models:
- UNI: ViT-L/16 (self-supervised DINOv2)
- CONCH: Vision-LLM
Each patch is resized and projected into a 1024-dimensional feature space, which is saved for subsequent MIL processing.
4. Training Protocol and Hyperparameters
CLAM-SB is trained in a stratified 5-fold cross-validation on 210 WSIs (per-fold: approximately 168 train, 42 validation). The optimization uses the Adam algorithm with a learning rate of , linear warm-up over 5 epochs, and weight decay of . Dropout is fixed at 0.4. Training proceeds for up to 100 epochs with early stopping (patience 10) and a batch size of 1 WSI per step. The loss comprises bag-level focal loss and a 0.5 weight pseudo-labeling instance loss (as in “CLAM style”), with label smoothing factor . Attention hidden size is 384, encoder output size is 512.
5. Performance Evaluation
In five-fold cross-validation, CLAM-SB (using both UNI and CONCH features) achieved:
- Mean AUC: 0.836
- Mean accuracy: 76.2%
For reference, ABMIL (multi-head gated attention) attained mean AUC 0.767 and accuracy 70.9%, while ConvNeXt-MIL-XGBoost achieved accuracy 73.5% and macro F1-score 0.492.
| Model | Mean AUC | Accuracy | Macro F1 |
|---|---|---|---|
| CLAM-SB (UNI+CONCH) | 0.836 | 76.2% | — |
| ABMIL | 0.767 | 70.9% | — |
| ConvNeXt-MIL-XGBoost | — | 73.5% | 0.492 |
Editor’s term: “SB” denotes the set of enhancements above baseline CLAM.
6. Analysis of Model Efficacy
CLAM-SB’s performance advantages are attributed to several factors:
- Expanded Attention Capacity: The increase to 384 hidden units in the attention module enables richer modeling of subtle, high-dimensional histological cues.
- Advanced Nonlinearities: GELU activation and a deeper classifier architecture enhance gradient flow and the learning of complex interactions.
- Robust Regularization: Aggressive Dropout systematically reduces overfitting in the low-sample regime.
- Improved Optimization for Imbalanced Data: Focal loss with class re-weighting and label smoothing ameliorates imbalanced detection of the under-represented medium-risk class.
- Multi-Modal Pre-trained Features: Combining UNI (visual) and CONCH (vision-language) feature embeddings yields a more expressive patch representation.
Collectively, these modifications promote robust feature aggregation and classifier calibration, enabling superior stratification of breast cancer recurrence risk relative to alternative MIL implementations (Chen et al., 21 Dec 2025).