CARP3D: Deep Learning for 3D Biopsy Triage
- CARP3D is a deep learning triage tool that assigns context-aware risk scores to 2D slices from 3D biopsy volumes using intra-slice MIL and inter-slice attention pooling.
- The methodology integrates pretrained encoders and gated attention mechanisms to extract discriminative features and aggregate spatial context, forming a 2.5D risk estimation framework.
- Quantitative evaluations in prostate cancer risk stratification demonstrate a 9-point AUC improvement and enhanced F2 scores over traditional 2D-only approaches.
CARP3D is a deep learning triage approach designed for the identification of high-risk regions in 3D volumetric biopsy datasets, particularly in the context of non-destructive histopathology enabled by open-top light-sheet (OTLS) microscopy. CARP3D automates the assignment of risk scores to individual 2D image slices extracted from a 3D volume, thereby prioritizing the most diagnostically relevant cross-sections for pathologist review. The approach leverages both intra-slice multiple-instance learning (MIL) and inter-slice attention-based pooling, producing context-aware 2.5D risk estimates for each slice. Quantitative results in prostate cancer risk stratification demonstrate that CARP3D outperforms conventional 2D-only MIL methods, highlighting the benefit of leveraging depth context for improved triage in large-scale 3D pathology datasets (Gao et al., 2024).
1. Motivation and Problem Statement
Histopathologic diagnosis of solid tissue biopsies has traditionally relied on the examination of a small subset of thin 2D slices—often a few 4 μm sections—from a 3D specimen. This limited sampling strategy can lead to undersampling and a substantial risk of missing high-grade or spatially heterogeneous disease regions. Recent advances in non-destructive 3D pathology, notably OTLS microscopy, enable comprehensive high-resolution imaging of entire biopsy cores as hundreds to thousands of contiguous, 2D, H&E-like slices.
The volumetric nature of 3D pathology datasets renders exhaustive manual review by pathologists infeasible. CARP3D addresses this by assigning slice-level risk scores across the depth axis, enabling prioritization of high-risk slices and thus targeted human review. This workflow is especially relevant in prostate cancer diagnosis, where existing review protocols are limited by partial sampling (Gao et al., 2024).
2. 2.5D Multiple-Instance Learning (MIL) Model
CARP3D integrates the strengths of MIL—traditionally used for weakly supervised learning on sets ("bags") of image patches—with inter-slice context pooling, forming a "2.5D" risk estimation paradigm. The model processes each 2D slice as a bag of image patches, aggregates discriminative features via attention mechanisms, and further pools features from adjacent slices to exploit spatial continuity and context.
Intra-Slice Feature Aggregation
Each 2D slice () is tessellated into non-overlapping patches of size . Patch-level features are extracted using a pretrained encoder, (e.g., ResNet50 or CTransPath), followed by a fully connected (FC) layer with ReLU, mapping to .
For MIL aggregation, CARP3D uses the gated attention mechanism: with , 0, and the slice-level embedding
1
Inter-Slice Context Integration
For a slice of interest (SOI) 2, CARP3D considers 3 neighboring slices above and below, sampled every 4 slices, yielding 5 context slices 6. Each neighbor is embedded via the same MIL pipeline to obtain 7.
A learnable vector 8 weights features across the set: 9 producing the context-aware SOI feature: 0 fed to a linear classification head, which outputs
1
where 2 corresponds to the probability of higher-grade (Gleason grade, GG ≥ 2) cancer in slice 3.
3. Architectural Design and Workflow
All processing modules in CARP3D are end-to-end differentiable. The model comprises the following main components:
- Input Stage: Accepts a 2D slice of interest plus 4 spatially neighboring slices.
- Patch Encoding: Utilizes pretrained ResNet505 and CTransPath feature concatenation, followed by a 512-dimensional FC layer.
- Intra-Slice Attention: Independent gated attention module per slice computes slice-level embeddings.
- Inter-Slice Pooling: Learnable weighting of neighboring slice features using vector 6.
- Classification Head: Produces a two-class risk estimate (7), indicating the probability of higher-grade cancer.
This modular design supports replication and facilitates future extension by substituting encoders or augmenting the context-pooling strategy (Gao et al., 2024).
4. Training Regimen and Technical Details
The CARP3D framework was trained using a dataset of 124 representative slices from 115 OTLS-imaged prostate biopsies collected from 54 patients. Ground truth slice-level labels were assigned by a panel of pathologists as low-grade (GG = 1) or higher-grade (GG ≥ 2). The model employs leave-one-patient-out cross-validation for assessment, with all slices from a held-out patient used for testing in each fold.
Key training parameters:
- Loss: Cross-entropy on predicted probabilities 8
- Optimizer: Adam, learning rate 9, batch size 256
- Data augmentation: Random patch flips, brightness/contrast jittering (for ResNet50)
- Neighborhood selection: Context radius of ±80 μm (axial), corresponding to 0, with 1 tested
- Input Sampling: 1–2 center-labeled slices per volume during training; all slices are scored at inference
This regimen ensures robust evaluation of slice-level risk estimation across heterogeneous biopsy datasets.
5. Performance in Prostate Cancer Risk Stratification
CARP3D was benchmarked against a 2D baseline consisting of attention-based MIL (ABMIL) on ResNet502 and CTransPath features, followed by a classifier applied to slices independently. The following table summarizes the main reported metrics:
| Model | AUC (%) | F2 Score (%) |
|---|---|---|
| 2D Baseline (ABMIL) | 81.3 (75.2–87.3) | 88.5 |
| CARP3D (2.5D pooling) | 90.4 | 92.4 |
CARP3D yields an approximate 9-point absolute improvement in area under the ROC curve (AUC) and a gain in F2 score. This suggests that inter-slice context in the 2.5D approach enhances model sensitivity for high-grade lesions, likely translating to improved detection of aggressive disease and efficiency in triage workflows. These results directly demonstrate that exploiting axial continuity in 3D datasets can overcome the limitations of 2D-only analysis (Gao et al., 2024).
6. Clinical Implications, Modularity, and Future Directions
By identifying high-risk slices with increased precision, CARP3D supports efficient and scalable pathologist workflows, potentially reducing the risk of missed aggressive lesions in voluminous 3D pathology datasets. The model's modular design—comprising pretrained patch encoders, gated attention aggregation, attention-weighted inter-slice pooling, and a simple classifier—facilitates extension to other encoder backbones and pooling architectures.
Future avenues may include integration of alternative feature extractors, adoption of more expressive inter-slice transformers, or the inclusion of clinical metadata to further optimize performance. A plausible implication is that similar architectures could generalize to other tissue types and diagnostic scenarios amenable to 3D imaging (Gao et al., 2024).