- The paper demonstrates that deep regression models can predict mRNA profiles from H&E stained WSIs, with pathology-specific transformers outperforming generic feature extractors.
- The authors found that Direct-ABMIL and Contrastive models yield comparable results, with the Contrastive approach showing resilience using suboptimal feature extractors.
- The experiments indicate that a single comprehensive model regressing all genes is computationally efficient compared to multiple specialized models with marginal performance gains.
Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction
The paper "Evaluating Deep Regression Models for WSI-Based Gene-Expression Prediction" presents an in-depth analysis of various deep learning models aimed at predicting mRNA gene-expression profiles from hematoxylin and eosin (H&E) stained whole-slide images (WSIs). This approach holds promise for cost-effective and accessible molecular phenotyping, crucial for precision medicine, especially within cancer research.
Key Insights and Methodology
The paper primarily explores the high-dimensionality challenge inherent in gene-expression prediction, given the task involves continuous measurements for approximately 20,530 genes. It benchmarks different regression models on datasets from The Cancer Genome Atlas (TCGA) across four cancer types: TCGA-BRCA (breast), TCGA-HNSC (head-neck), TCGA-STAD (stomach), and TCGA-BLCA (bladder).
Four distinct regression models were examined:
- Direct - ABMIL (Attention-Based Multiple Instance Learning): This model utilized an ABMIL aggregator to process patch-level features into a WSI-level feature vector.
- Direct - Patch-Level: A streamlined model that foregoes the ABMIL aggregator and instead aggregates predictions directly at the patch level.
- Contrastive: This model adopts a contrastive learning approach to align WSI and gene-expression profiles, leveraging cosine similarities for prediction.
- kNN (k-Nearest Neighbors): A simplistic baseline model devoid of trainable parameters, utilizing kNN for regression.
The models also utilized two feature extractors: UNI—a pathology-specific vision transformer—and Resnet-IN, pretrained on ImageNet.
Experimental Findings
Model Performance:
- Across all datasets and metrics, the UNI feature extractor demonstrated superior performance compared to Resnet-IN, underscoring the importance of domain-specific pretrained models in computational pathology.
- Among the regression models, kNN consistently underperformed, which was anticipated given its simplicity.
- The Direct - ABMIL and Contrastive models exhibited comparable performance, suggesting robustness of these methods for high-dimensional gene-expression prediction. Notably, the Contrastive model showed greater resilience with the suboptimal Resnet-IN feature extractor.
Numerical Results:
- For TCGA-BRCA, the UNI - Direct - ABMIL model regressed approximately 4,927 genes with a Pearson correlation ≥ 0.4, achieving a mean Pearson correlation of 0.562 for the PAM50 gene subset, which is recognized for its prognostic value in breast cancer.
- In contrast, the Resnet-IN - Direct - ABMIL achieved a mean Pearson of 0.373 on the same PAM50 subset, indicating a substantial drop in performance.
Multiple Models Approach:
- Experiments combining multiple models to regress subsets of genes did not yield significant performance improvements over a single model regressing all 20,530 genes. This finding suggests that the computational cost of training multiple models might not be justifiable for marginal gains.
Conclusions and Recommendations
The paper concludes with several actionable insights:
- Preference for Pathology-Specific Feature Extractors: Given the observed dominance of UNI, the use of pathology-specific models is crucial for improving prediction accuracy in computational pathology tasks.
- Optimal Model Selection: Despite the close performance between Direct - ABMIL and Contrastive models, both approaches should be considered as go-to models, with model selection potentially influenced by the specifics of the feature extractor being employed.
- Single vs. Multiple Models: Training a single model to regress all genes emerges as a computationally efficient yet highly competitive baseline. The low performance of single-gene models advocates against their use except in scenarios where very specific genes are of interest.
Implications and Future Directions
The implications of this research are twofold:
- Practical Deployment: In practical terms, the findings simplify the workflow for developing gene-expression prediction models, advocating for single, comprehensive models rather than multiple specialized ones.
- Theoretical Advancements: The surprising inefficacy of single-gene models opens avenues for deeper investigation into model training dynamics and the potential benefits of multi-task learning in gene-expression prediction.
Future work should aim to validate these findings on external datasets to ensure generalizability. Enhancing the robustness of Contrastive models to various feature extractors could also be explored further, alongside investigating the underlying reasons for the poor performance of single-gene regressors.
Ultimately, this detailed evaluation lays a strong foundation for both the scalable application and incremental improvement of deep learning models in predicting gene-expression profiles from WSIs, driving advancements in computational pathology and precision medicine.