Papers
Topics
Authors
Recent
2000 character limit reached

ConvNeXt-MIL-XGBoost: Cancer Risk Prediction Pipeline

Updated 28 December 2025
  • The paper introduces a modular pipeline that integrates a frozen ConvNeXt backbone, gated attention MIL, and XGBoost for breast cancer recurrence risk stratification from H&E WSIs.
  • It employs a Transformer-inspired patch extraction and attention pooling mechanism to aggregate discriminative features across image patches.
  • The study demonstrates clinical relevance by achieving 73.5% accuracy and producing interpretable attention maps to support diagnostic decisions.

ConvNeXt-MIL-XGBoost is a modular weakly-supervised deep learning pipeline designed for automated stratification of breast cancer recurrence risk from Hematoxylin and Eosin (H&E) stained whole-slide images (WSIs). Developed and benchmarked alongside CLAM-SB and ABMIL models for predicting 5-year recurrence risk tiers, the method integrates a frozen ConvNeXt-Base neural feature extractor, a gated-attention Multiple Instance Learning (MIL) aggregator, and an XGBoost gradient-boosted decision tree classifier. The pipeline leverages robust natural-image priors and interpretable feature aggregation for genomics-correlated risk prediction, with demonstrated efficacy on a dataset of 210 patient cases (Chen et al., 21 Dec 2025).

1. ConvNeXt Backbone for Patch-Level Feature Extraction

The first stage of ConvNeXt-MIL-XGBoost entails decomposing each WSI into 256×256 px tissue-containing patches, which are independently encoded using the ConvNeXt-Base convolutional neural network. The ConvNeXt-Base architecture employs a “Transformer-inspired” design characterized by large depth-wise kernels, inverted bottlenecks, and LayerNorm, facilitating multi-scale texture and context capture. Core configuration parameters include depths per stage of [3, 3, 27, 3] layers; channel dimensions of [96, 192, 384, 768]; 4×4 patch-embedding convolutions at input, and subsequent 2×2 strided convolutions between stages. Patch representations are finalized by an additional fully-connected projection to a 1024-dimensional vector, yielding approximately 90 million parameters.

ImageNet-1k pre-trained weights initialize ConvNeXt, accelerating convergence and imparting robust feature representations given the small sample size (210 WSIs). No fine-tuning is performed; the backbone serves as a frozen encoder for subsequent stages, ensuring computational efficiency and reproducibility.

2. Attention-Based Multiple Instance Learning Aggregation

Following patch encoding, the MIL framework aggregates instance-level features xi,jR1024x_{i,j} \in \mathbb{R}^{1024} into a slide-level embedding siR1024s_i \in \mathbb{R}^{1024} for each slide ii with NiN_i patches. The aggregation employs a gated attention pooling mechanism:

ai,j=softmaxj(wTtanh(Vxi,j)σ(Uxi,j))a_{i,j} = \mathrm{softmax}_j \left( w^T \tanh(V x_{i,j}) \odot \sigma(U x_{i,j}) \right)

si=j=1Niai,jxi,js_i = \sum_{j=1}^{N_i} a_{i,j} x_{i,j}

where V,UR384×1024V, U \in \mathbb{R}^{384 \times 1024} are learnable weights, σ\sigma denotes the sigmoid function, wR384w \in \mathbb{R}^{384} is a projection vector, \odot indicates element-wise multiplication, and the softmax ensures jai,j=1\sum_j a_{i,j}=1.

To address the pronounced class imbalance—evident from only 21 medium-risk slides—the attention module is trained using Focal Loss with γ=2\gamma=2 and disproportionately up-weighted medium-risk class (αmedium=3.0\alpha_{medium}=3.0, αlow=αhigh=1.0\alpha_{low} = \alpha_{high} = 1.0). The optimizer is Adam (β₁=0.9, β₂=0.999) with an initial learning rate of 1×1041 \times 10^{-4} and batch size of 8 slides. This attention model aggregates diverse instances while focusing on discriminative tissue regions.

3. XGBoost Classification on Structured Slide Embeddings

The trained attention pooling module outputs for each WSI a 1024-dimensional embedding sis_i, supplemented by auxiliary features including MIL logits, softmax probabilities, attention distribution summary statistics (mean, median, quartiles, skewness), and patch count NiN_i—resulting in a concatenated feature vector fiR1047f_i \in \mathbb{R}^{1047}.

Classification utilizes XGBoost, a gradient-boosted decision tree algorithm, targeting slide-level three-tier risk prediction (yi{0,1,2}y_i\in\{0,1,2\}). The multi-class softmax objective combines cross-entropy loss with tree complexity penalties:

L(Θ)=i=1M(yi,y^i(t))+k=1TΩ(fk)L(\Theta) = \sum_{i=1}^{M} \ell(y_i, \hat{y}_i^{(t)}) + \sum_{k=1}^{T} \Omega(f_k)

Ω(fk)=γTk+12λjwk,j2\Omega(f_k) = \gamma T_k + \frac{1}{2}\lambda\sum_j w_{k,j}^2

with T=200T=200 trees, learning rate η=0.10\eta=0.10, maximum tree depth 6, γ=1.0\gamma=1.0, and L₂ regularization λ=1.0\lambda=1.0. Hyperparameters are tuned on validation splits, and the XGBoost optimizer applies second-order gradient methods.

4. Training Protocol and Data Management

ConvNeXt-MIL-XGBoost training follows a robust 5-fold cross-validation procedure, with data splits stratified by risk class to preserve distributional balance. For each fold, 80% of slides are allocated for backbone and attention training, 10% for MIL validation and early stopping, and 10% as an unseen test set.

Risk labels are based on consensus between the 21-gene Recurrence Score and clinicopathological adjudication. MIL model training employs Focal Loss, while XGBoost operates on multi-class logistic loss. Adam optimizer (attention module) and XGBoost’s native optimizer ensure efficient convergence. No data augmentation is utilized beyond randomized patch ordering; patch segmentation and extraction are deterministic.

5. Comparative Performance and Ablation Insights

ConvNeXt-MIL-XGBoost exhibits mean classification accuracy of 73.5%±3.8%73.5\% \pm 3.8\% across five cross-validation folds. Comparative results are summarized:

Model Mean AUC Mean Accuracy
CLAM-SB 0.836±0.0620.836 \pm 0.062 76.2%±4.5%76.2\% \pm 4.5\%
ABMIL 0.767±0.0460.767 \pm 0.046 70.9%±5.1%70.9\% \pm 5.1\%
ConvNeXt-MIL-XGBoost 73.5%±3.8%73.5\% \pm 3.8\%

No AUC was computed for the XGBoost pipeline, and formal significance testing was omitted. Notably, ConvNeXt-Base provides superior patch embeddings over ResNet-18, raising classification accuracy by ≈5%. The gated attention (Sigmoid×tanh) improves aggregation over mean pooling (+4% accuracy). XGBoost outperforms equivalent multi-layer perceptron classifiers (same fif_i input), with MLP achieving only 70.0% accuracy and exhibiting greater hyperparameter sensitivity.

6. Clinical Deployment Considerations

The ConvNeXt-MIL-XGBoost framework supports integration as a decision-support module in digital pathology environments. Automated processing of digitized slides can provide rapid three-tier recurrence risk scores and deploy attention heatmap overlays to highlight diagnostically relevant tissue regions. This workflow can prioritize high-risk cases for review and assist in adjudicating borderline cases.

Recommended deployment steps include multi-center prospective validation, regulatory approval, and integration with laboratory information systems. Ongoing monitoring of model drift due to factors such as staining or scanner variations and re-calibration with local cohorts are advised to sustain clinical efficacy.

7. Modular Architecture and Interpretability

ConvNeXt-MIL-XGBoost is structured to decouple representation learning (ConvNeXt backbone), instance aggregation (attention MIL), and structured classification (XGBoost). This modularity yields an interpretable pipeline for genomics-correlated risk prediction, supporting transparent diagnostic audits and facilitating adaptation to evolving clinical requirements. The model’s attention maps and multi-tier output structure bolster its utility within computational pathology workflows, promoting rapid, cost-effective clinical decision support while maintaining methodological robustness (Chen et al., 21 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to ConvNeXt-MIL-XGBoost.